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Abstract. Thie computational complexity of tlie isomoq)iiism problem for reg- 
ular trees, regular linear orders, and regular words is analyzed. A tree is regular 
if it is isomorphic to the prefix order on a regular language. In case regular lan- 
guages are represented by NFAs (DFAs), the isomorphism problem for regular 
trees turns out to be EXPTI M E-complete (resp. P-complete). In case the input au- 
tomata are acyclic NFAs (acyclic DFAs), the corresponding trees are (succinctly 
represented) finite trees, and the isomorphism problem turns out to be PSPACE- 
complete (resp. P-complete). A linear order is regular if it is isomorphic to the 
lexicographic order on a regular language. A polynomial time algorithm for the 
isomorphism problem for regular linear orders (and even regular words, which 
generalize the latter) given by DFAs is presented. This solves an open problem 
by Esik and Bloom. 



1 Introduction 

Isomorphism problems for infinite but finitely presented structures are an active re- 
search topic in algorithmic model theory [1]. It is a folklore result in computable model 
theory that the isomorphism problem for computable structures (i.e., structures, where 
the domain is a computable set of natural numbers and all relations are computable too) 
is highly undecidable — more precisely, it is 17^-complete, i.e., complete for the first 
existential level of the analytical hierarchy. Khoussainov et al. proved in [17] that even 
for automatic structures (i.e., structures, where the domain is a regular set of words 
and all relations can be recognized by synchronous multitape automata), the isomor- 
phism problem is -complete. In [19], this result was further improved to automatic 
order trees and automatic linear orders. On the decidability side, Courcelle proved that 
the isomorphism problem for equational graphs is decidable [7]. Recall that a graph is 
equational if it is the least solution of a system of equations over the HR graph opera- 
tions. We remark that Courcelle's algorithm for the isomorphism problem for equational 
graphs has very high complexity (it is not elementary), since it uses the decidability of 
monadic second-order logic on equational graphs. 

In this paper, we continue the investigation of isomorphism problems for infinite 
but finitely presented structures at the lower end of the spectra. We focus on two very 
simple classes of infinite structures: regular trees and regular words. Both are particular 
automatic structures. Recall that a countable tree is regular if it has only finitely many 
subtrees up to isomorphism. This definition works for ordered trees (where the children 
of a node are linearly ordered) and unordered trees. An equivalent characterization in 
the unordered case uses regular languages: An unordered (countable) tree T is regular 



if and only if there is a regular language L C E* which contains the empty word and 
such that T is isomorphic to the tree obtained by taking the prefix order on L (the empty 
word word is the root of the tree). Hence, a regular tree can be represented by a finite 
deterministic or nondeterministic automaton (DFA or NFA), and the isomorphism prob- 
lem for regular trees becomes the following computational problem: Given two DFAs 
(resp., NFAs) accepting both the empty word, are the corresponding regular trees iso- 
morphic? It is is not difficult to prove that this problem can be solved in polynomial 
time if the two input automata are assumed to be DFAs; the algorithm is very simi- 
lar to the well-known partition refinement algorithm for checking bisimilarity of finite 
state systems [15], see Section 3.1. Hence, the isomorphism problem for regular trees 
that are represented by NFAs can be solved in exponential time. Our first main result 
states that this problem is in fact EXPTIME-compIete, see Section 3.2. The proof of the 
EXPTI M E lower bound uses three main ingredients: (1) EXPTI M E coincides with alter- 
nating polynomial space [5], (ii) a construction from [14|, which reduces the evaluation 
problem for Boolean expressions to the isomorphism problem for (finite) trees, and (iii) 
a small NFA accepting all words that do not represent an accepting computation of a 
polynomial space machine [28]. ^ Our proof technique yields another result too: It is 
PSPACE-complete to check for two given acyclic NFAs Ai, A2 (both accepting the 
empty word), whether the trees that result from the prefix orders on L{Ai ) and L{A2), 
respectively, are isomorphic. Note that these two trees are clearly finite (since the au- 
tomata are acyclic), but the size of L{Ai) can be exponential in the number of states of 
Ai. In this sense, acyclic NFAs can be seen as a succinct representation of finite trees. 
The PSPACE-upper bound for acyclic NFAs follows easily from Lindell's result [21] 
that isomorphism of exphcitly given trees can be checked in logarithmic space. 

The second part of this paper studies the isomorphism problem for regular words, 
which were introduced in [6]. A generalized word over an alphabet 17 is a countable 
hnear order together with a iJ-coloring of the elements. A generalized word is regu- 
lar if it can be obtained as the least solution (in a certain sense made precise in [6]) 
of a system Xi — ti, . . . , = tn- Here, every ti is a finite word over the alphabet 
S U {Xi , . . . , Xn}. For instance, the system X = abX defines the regular word {ab)". 
Courcelle [6] gave an alternative characterization of regular words: A generalized word 
is regular if and only if it is equal to the frontier word of a finitely-branching ordered 
regular tree, where the leaves are colored by symbols from E. Here, the frontier word 
is obtained by ordering the leaves in the usual left-to-right order (note that the tree is 
ordered). Alternatively, a regular word can be represented by a DFA A, where the set of 
final states is partitioned into sets Fa (a £ E); we call such a DFA a partitioned DFA. 
The corresponding regular word is obtained by ordering the language of A lexicograph- 
ical and coloring a word w e L{A) with a if w leads from the initial state to a state 
from Fa. A third characterization of regular words was provided by Heilbrunner [13]: 
A generalized word is regular if it can be obtained from singleton words (i.e., symbols 
from S) using the operations of concatenation, w-power, uJ-power and dense shuffle. 
For a generalized word u, its w-power (resp. oJ-power) is the generalized word uuu ■ ■ ■ 
(resp. • • • uuu). Moreover, the shuffle of generaUzed words «!,...,«„ is obtained by 

' This construction is used in [28] to prove that the universality problem for NFAs is PSPACE- 
complete. 
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choosing a dense coloring of the rationals with colors {1, . . . , n} (up to isomorphism, 
there is only a single such coloring [26]) and then replacing every i-colored rational 
by Ui. In fact, Heilbrunner presents an algorithm which computes from a given system 
of equations (or, alternatively, a partitioned DFA) an expression over the above set of 
operations (called a regular expression in the following) which defines the least solu- 
tion of the system of equations. A simple analysis of Heilbrunner's algorithm shows 
that the computed regular expression in general has exponential size with respect to 
the input system of equations and it is easy to see that this cannot be avoided.'^ The 
next step was taken by Thomas in [29], where he proved that the isomorphism problem 
for regular words is decidable. For his proof, he uses the decidability of the monadic 
second-order theory of linear orders; hence his proof does n ot yield an elementary upper 
bound for the isomorphism problem for regular words. Such an algorithm was presented 
later by Bloom and Esik in |2J, where the authors present a polynomial time algorithm 
for checking whether two given regular expressions define isomorphic regular words. 
Together with Heilbrunner's algorithm, this yields an exponential time algorithm for 
checking whether the least solutions of two given systems of equations (or, altema- 
tively, the regular words defined by two partitioned DFAs) are isomorphic. It was asked 
in [2], whether a polynomial time algorithm for this problem exists. Our second main 
result answers this question affirmatively. In fact, we prove that the problem, whether 
two given partitioned DFAs define isomorphic regular words, is P-complete. A large 
part of this paper deals with the polynomial time upper bound. The first step is simple. 
By reanalyzing Heilbrunner's algorithm, it is easily seen that from a given partitioned 
DFA (defining a regular word u) one can compute in polynomial time a succinct repre- 
sentation of a regular expression for u. This succinct representation consists of a DAG 
(directed acyclic graph), whose unfolding is a regular expression for u. The second and 
main step of the proof shows that the polynomial time algorithm of Bloom and Esik 
for regular expressions can be refined in such a way that it works (in polynomial time) 
for succinct regular expressions too. The main tool in our proof is (besides the machin- 
ery from [2]) algorithmics on compressed strings (see [27] for a survey), in particular 
Plandowski's result that equality of strings that are represented by straight-line pro- 
grams (i.e., context free grammars that only generate a single word) can be checked in 
polynomial time [24]. It is a simple observation that an acyclic partitioned DFA is basi- 
cally a straight-line program. Hence, we show how to extend Plandowski's polynomial 
time algorithm from acyclic partitioned DFAs to general partitioned DFAs. 

An immediate corollary of our result is that it can be checked in polynomial time 
whether the lexicographic orderings on the languages defined by two given DFAs (so 
called regular hnear orderings) are isomorphic. For the special case that the two input 
DFAs accept well-ordered languages, this was shown in [8]. Let us mention that it is 
highly undecidable (Z'J^ -complete) to check, whether the lexicographic orderings on 
the languages defined by two given deterministic pushdown automata (these are the 
algebraic linear orderings [3]) are isomorphic [19]. 



^ Take for instance the system Xi = Xj+iXj+i (1 < i < n), X„ = a, which defines the finite 
word . 
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2 Preliminaries 



For an equivalence relation i? on a set A and a G Awe denote with [a] r the equivalence 
class containing R. Moreover, [A]ji — {[a]B. \ a G A}. Let us take a finite alphabet S. 
The length of a finite words u G S* is denoted by \u\. Let S+ = {u € U* \ \u\ > 0}, 
E'' = {u&S* \ \u\ = k}, = {ueS* \ \u\ < k}, and E^'' = {u G S* \ \u\ > 
k}. For u,v € S*, we write u <pref v if there exists w G S* with v = uw, i.e., u is 
a prefix of v. We write m <pref v if m <pref v and u ^ v. For a language L C 17* let 
pref (L) = {u € i7* | 3w £ L : u <pref v}- For a fixed Unear order < on the alphabet 
S we define the lexicographic order <iex on S* as follows: u <iex w if u <pref v or 
there exist words w, x, y and a,b G S such that a <b,u = wax, and w = w6y. 

2.1 Complexity theory 

We assume that the reader has some basic background in complexity theory, in partic- 
ular concerning the complexity classes NL, P, PS PACE, and EXPTIME, see e.g. [23]. 
All completeness results in this paper refer to logspace reductions. 

A PSPACE-transducer is a deterministic Turing machine with a read-only input 
tape, a write-only output tape and a work tape, whose length is bounded by n'-'^^\ 
where n is the input length. The output is written from left to right on the output tape, 
i.e., in each step the transducer either outputs a new symbol on the output tape, in which 
case the output head moves one cell to the right, or the transducer does not output a new 
symbol in which case the output head does not move. Moreover, we assume that the 
transducer terminates for every input. This implies that a PSPACE-transducer computes 
a mapping f : S* ^ 0* , where | is bounded by 2'""'" ^ . We need the following 
simple lennma: 

Lemma 2.1. Assume that the mapping f : E* ^ 0* can be computed by a PSPACE- 
transducer and let L C 0* be a language in fiSPACEQ.og'' {n)) for some constant k. 
Then f-^{L) belongs to PSPACE. 

Proof. The proof uses the same idea that shows that the composition of two logspace 
computable mappings is again logspace computable. Let w € i7* be an input. Basically, 
we run the NSPACE(log'^(n))-algorithm for L on the input /(w). But since / can be 
computed by a PSPACE-transducer (which can generate an exponentially long output) 
the length of f{w) can be only bounded by 2'"''''*^' . Hence, we cannot construct /{w) 
explicitly. But this is not necessary. We only store a pointer to some position f{'w) (this 
pointer needs space [wl*^^^^) while running the NSPACE(log*(n))-algorithm for L. 
Each time, this algorithm needs the i*'^ letter of /(w), we run the PSPACE-transducer 
for L until the i*'* output symbol is generated. The first i — 1 symbols of f{w) are not 
written on the output tape. Note that the NSPACE(log'^(n))-algorithmfor L needs space 
log'=(2l"'l°''') = |«;|OW while running on f{w). Hence, the total space requirement is 
bounded by □ 

An alternating Turing machine is an ordinary nondeterministic Turing machine, where 
in addition the set of states Q is partitioned into existential states (Qj) and universal 



4 



states (Qv)- A configuration, where the current state is existential (resp., universal) is 
called an existential (resp., universal) configuration. Let us assume that M is an alter- 
nating Turing machine without infinite computation paths. Then, we define inductively 
the notion of an accepting configuration as follows: If c is an existential configuration, 
then c is accepting if and only if c has an accepting successor configuration. If c is a 
universal configuration, then c is accepting if and only if all successor configurations of 
c are accepting. Note that a universal configuration without successor configurations is 
accepting, whereas an existential configuration without successor configurations is not 
accepting. An input x is accepted by M (briefly, x € L{M)) if and only if the initial 
configuration with input x is accepting. 

The complexity class C= P consists of all languages L C E* such that there exist 
nondeterministic polynomial time Turing machines Mi and M2 with input alphabet 
S such that for every input w G S*: w G L if and only if the number of accepting 
computations of Mi on input w equals the number of accepting computations of M2 
on input w. If we replace in this definition nondeterministic polynomial time Turing 
machines by nondeterministic logspace Turing machines, we obtain the class C=L. 

2.2 Finite automata and transducer 

Let A = {Q, S, S, qo, F) be a nondeterministic finite automaton, briefly NFA, where Q 
is the set of states, E is the input alphabet, SCQxExQis the transition relation, 
go G Q is the initial state, and F C Q is the set of final states. A state g G Q is accessible 
(resp. coaccessible), if q can be reached from the initial state go (resp., if a final state 
from F can be reached from g). We say that A is accessible (resp., coaccessible), if 
every state of A is accessible (resp, coaccessible). An NFA A is called prefix-closed 
if every state of ^ is a final state. In that case, the language L{A) is prefix-closed. 
Moreover, if A is coaccessible and the prefix-closed NFA B results from A by making 
every state final, then clearly L{B) = pref(L(^)). For a DFA (deterministic finite 
automaton), 5 is a partial map from Q x S to Q. Sometimes, we will also deal with 
NFAs (DFAs) without an initial state. If A is an NFA without an initial state and q is 
a state of A, then L{A, q) is the language accepted by A, when q is declared to be the 
initial state. We will need the following simple lenmia, which is probably folklore: 

Lemma 2.2. For a given a DFA A = {Q, S, S, go, F), we can compute the cardinality 
\L{A)\ e N U {00} m polynomial time. 

Proof. W.l.o.g we can assume that A is accessible and coaccessible. Then L{A) is finite 
if and only if A is acyclic. So assume that A is acyclic. Since A is deterministic, the 
size of L{A) equals the number of paths from go to F. Now, in a directed acycUc graph, 
the number of paths from a source node to aU other nodes can be easily computed by 

dynamic programming in polynomial time. □ 

A partitioned DFA is a tuple A = {Q, S, S, go, {Fa)aer), where T is a finite alphabet, 
B = {Q, S, S, go, UaGr ^a) is an ordinary DFA and Fa D Ft, = <l) for a ^ b. Since 
;B is a DFA, it follows that the language L{B) is partitioned by the languages L{Aa), 
where Aa = {Q, S, go. Fa) (a G F). We use partitioned DFAs to label elements 
of a structure with symbols from F. The language L{Aa) wiU be the set of a-labelled 
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elements. We do not introduce partitioned NFAs, since for NFAs the languages L{Aa) 
(a e -T) would not partition L{B) (thus, a point could get several labels). 

A (e-free) rational transducer is a tuple T = (Q, S, F, S, qo, F), where Q (the set 
of states), S (the input alphabet), and F (the output alphabet) are finite sets, qo & Q 
is the initial state, F C Q is the set of final states, and S C Q x S x F~^ x Q is the 

transition relation. A transition {q, a, w,p) €6 is also written as q -^^-^ p. The rational 
transducer T defines a binary relation |7l C S* x F* in the usual way. For a language 
LCE* letr(L) = {v G F* \ 3uG L: {u,v) G {T]}. 



2.3 Trees 

A tree is a partial order T = {A; <), where < has a smallest element (the root of the 
tree; in particular A ^ %) and for every a £ A, the set {h E A \ h < a} is finite 
and linearly ordered by <. We write a <bif a < b and there does not exist c G A 
with a < c < b. For a G A, let child(a, T) (the set of children of a) be the set of all 
be A such that a < b. The set of leaves of Tis leaf(T) = {a e A \ child(a,r) 0}. 
For a G A let T fa be the subtree of T rooted at a, i.e., the set of nodes of T\a is 
{b G A \ a < b}. The tree T is finitely branching if child(a, F) is finite for all a G A. 
An infinite path of T is an infinite chain < ai < 02 < • • • ; finite paths are defined 
analogously. If T is finite and a G A, then the height of a in T is the maximal length 
of a path that starts in a. For trees T\ and T2 we write Ti = T2 in case T\ and T2 are 
isomorphic. 

A tree over the finite alphabet is a pair T = {L; <pref ). where L C E* is a 
language with e G L. Note that T is indeed a tree in the above sense. Most of the time, 
we will identify the language L with the tree (F; <pref ). Moreover, if L = pref (L) (i.e., 
F is prefix-closed), then T is a finitely branching tree. 

A countable tree T is called regular if F has only finitely many subtrees up to 
isomorphism. Equivalently, a countable tree is regular if it is isomorphic to a tree of the 
form (L; <pref ), where L is a regular language with e G F. We require that the empty 
word e belongs to F in order to ensure the existence of a root (otherwiese {F; <pref ) 
would be only a forest). If F is accepted by the accessible DFA A, then the subtrees of 
{L; <pref) correspond to the final states of A. Note that by our definition, a regular tree 
need not be finitely branching. 

Our definition of a regular tree (having only finitely many subtrees up to isomor- 
phism) makes sense for other types of trees as well, e.g. for node-labeled trees or or- 
dered trees (where the children of a node are linearly ordered). These variants of regular 
trees can be generated by finite automata as well. For instance, a node-labeled regu- 
lar tree (F; <pref, {Fa)aer), where F is the finite labeling alphabet and Fa is the set 
of a-labeled nodes can be specified by a partitioned DFA {Q, E, S, qo, {Fa)aer) with 
Fa = F{Q, S, 5, qo, Fa) and F = {Jaer ^a- We do not consider node labels in this 
paper, since it makes no difference for the isomorphism problem (node labels can be 
ehminated by adding additional children to nodes). Ordered regular trees will be briefly 
considered in Section 4.8. 
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2.4 Linear orders 



See [26] for a thorough introduction into linear orders. Let i] be the order type of the 
rational numbers, lo the order type of the natural number, and cj be the order type of 

the negative integers. With n we denote a finite linear order with n elements. Let A = 
{L; <) be a linear order. A is dense if L consists of at least two elements, and for all 
X < y there exists z with x < z < y.'Ry Cantor's theorem, every countable dense 
linear order, which neither has a smallest nor largest element is isomorphic to r]. Hence, 
if we take symbols and 1 with < 1, then ({0, 1}*1; <iex) — The linear order A 
is scattered if there does not exist an injective order morphism ^p : r} ^ A. Clearly, 
w, uj, as well as every finite linear order are scattered. A linear order is regular if it is 
isomorphic to a linear order (L; <iex) for a regular language L. Hence, for instance, 77, 
uj, uj, and every finite linear order are regular linear orders. 

For two linear orders Ai = {Li; <i) and Ai = (L2; <2) with Li n L2 = we 
define the sum Ai + A2 = (£1 U L2; <), where x < y if and only if either x,y € Li 
and x <i y, or x,y £ L2 and x <2 y, or a; G Li and y S L2- We define the product 
Ai ■ A2 = {Li X L2; <) where {xi,X2) < (1/1,2/2) if and only if either X2 <2 J/2 or 
(X2 = 2/2 andxi <i yi). 

An interval of yl is a subset I C L such that x < z <y and x,y £ I implies z € I. 
An interval is right-closed (resp. left-closed) if it has a greatest (resp. smallest) element 
and it is closed if it is both right-closed and left-closed. An interval / is dense (resp., 
scattered) if the linear order < restricted to / is dense (resp., scattered). A predecessor 
(resp., successor) of x G L is a largest (resp., smallest) element of {y E L \ y < x} 
(resp., {y & L \ X < y}). Of course, a predecessor (resp., successor) of x need not 
exist, but if it exists then it is unique. 

2.5 Generalized words 

Generalized words are countable colored linear orders. Let E he a (possibly infinite) 
alphabet. A generalized word (or simply word) u over is a triple (L; <, r) such that 
L is a finite or countably infinite set, < is a linear order on L and r : L — >^ i7 is a 
coloring of L. The alphabet alph(M,) equals the image of r. If L is finite, we obtain a 
finite word in the usual sense. As for trees, we write u = v for generalized words u and 
V in case u and v are isomorphic. 

Let u = (L; <,t) be a generalized word over S with F = alph(M,). Let ?;„ = 
{La, <a, To) be a generalized word for each a & P. We define the generalized word 
u[{a/va)aer] = {L'\ <,t') as follows: 

- L' = {{x,y) \y&L,x& -^r(i/)}, 

- [x. y) < (.x', y') if and only if either y < y' 01 {y = y' and x <T(y) x'), and 

- T'{x,y) = T^(y)(x). 

Thus, u[{a/va)aer\ is obtained from u by replacing every o-labelled point by Va (for all 
a e S). Now we can define the regular operations on words. In order to do so we need 
the following words. The words ah and a" for a, 6 G 1? are as usual. The generalized 
word a'^ has w as underlying order and every element is colored with a. Finally, we 
let [ai, . . . , a„]'' be the generalized word with underlying order ry where the coloring 
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is such that any point is labeled by some Oj (1 < i < n) and, moreover, for any two 
points X < y and any 1 < i < n we find a point z with x < z < y colored by at. It can 
be shown that this describes a unique word up to isomorphism [26]. 

Definition 2.3 (Regular Operations). Let u,v,ui, . . . ,Un be words over E. We let: 

uv = {ab)[a/u,b/v] = a"[a/u] 

[ui,...,u„]'^ = [ai, . . . ,a„]'^[ai/ui, . . . ,an/u„] = a"[a/u\. 

Thus, the underlying linear order of uv is the sum of the underlying linear orders 
of u and v. Intuitively, we have u'^ = uuu ■ ■ ■ and u'^ = • • • uuu. Note that since 
[wi , . . . , Un]^ is invariant under permutations of the Ui we also sometimes use the nota- 
tion for a finite set X. The least set of words which is closed under the regular op- 
erations and contains the singleton words a for a € is called the set of regular words 
over E, denoted Reg(Z'). Note that this impUes that every regular word is non-empty, 
i.e., its domain is a non-empty set. Moreover, although we allow E to be infinite (this 
will be useful later), the alphabet alph(u) of a regular word u must be finite. Clearly, 
every regular word can be described by a regular expression over the above operations, 
but this regular expression is in general not unique. 

Example 2.4. Here are some typical identities between regular words, where X is a 
finite set of regular words, n > 0, m > 1, u, ui, . . . , w„ € X, every Vi (1 < i < m) 
has one of the forms X'^, yX"^, X^z, yX^z with y, z ^ X, and w, w are regular words: 

X^X" ^ X^mX" 9i (X")"' = {X"u)'^ ^ (X")" ^ {uX'^ f ^ X", 

[Ui, ...,Un,Vi,.. .,Vmr - , 

(vw)" = v{wv)", {vw)~ ^ {wv)~w. 

See [2] for a complete axiomatization of the equational theory of regular words. 

By a result of Heilbrunner [13], regular words can be characterized by partitioned 
DFAs as follows: Let A = {Q, F, 5, qo, {Fa)aeE) be a partitioned DFA, and let B = 
{Q, r, 5, qo, Uaei; ^a)- Let us fix a linear order on the alphabet F, so that the lexico- 
graphic order <iex is defined on F*. Then we denote with w{A) the generalized word 

w;(yl) = (L(-B);<iex,T), 

where t{u) — a(a € E,u £ L{B)) if and only if u e L{Q, F, S, qo, Fa). It is easy to 
construct from a given regular expression (describing the regular word u) a partitioned 
DFA A with u = w{A), see e.g. [29, proof of Proposition 2] for a simple construction. 
The other direction is more difficult. Heilbrunner has shown in [13] how to compute 
from a given partitioned DFA A (such that 'w{A) is non-empty) a regular expression 
for the word w{A), which is therefore regular.^ Unfortunately, the size of the regular 
expression produced by Heilbrurmer's algorithm is exponential in the size of A. In 

' In fact, Heilbrunner speaks about systems of equations and their least solutions instead of 
partitioned DFAs. But these two formalisms can be easily (and efficiently) transformed into 
each other. 
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Section 4.4, we will see that a succinct representation of a regular expression for w{A) 
can be produced in polynomial time. 

One can show that the isomorphism problem for regular words (given by partitioned 
DFAs) can be reduced (in logspace) to the isomorphism problem for regular hnear or- 
ders (given by DFAs). In other words, node labels can be ehminated as for regular trees 
(as remarked at the end of Section 2.3). So, the reader might ask, why we consider the 
isomorphism problem for regular words and do not restrict to regular linear orders. The 
point is that even if we start with regular linear orders, in the course of our polynomial 
isomorphism check regular words wiU naturally arise. 

3 Isomorphism problem for regular trees 

In this section, we investigate the isomorphism problem for (unordered) regular trees. 
We consider two input representations for regular trees: DFAs and NFAs. It turns out 
that while the isomorphism problem for DFA-represented regular trees is P-complete, 
the same problem becomes EXPTIME-complete for NFA-represented regular trees. 
Moreover, we show that for finite trees that are succinctly represented by acyclic NFAs, 
isomorphism is PSPACE-complete. 

3.1 Upper bounds 

Theorem 3.1. The following problem can be solved in polynomial time: 

INPUT: Two DFAs Ai and A2 such that e E L{Ai) n L{A2). 
QUESTION: {L{Ai); <pref) = (^(^2); <pref)? 

Proof. By taking the disjoint union of Ai and A2, it suffices to solve the following 
problem in polynomial time: 

INPUT: A DFA A without initial state and two final states p, q of A. 
QUESTION: {L{A,py, <pref) ^ {L{A,q); <pref)? 

Note that e e L{A, p) fl L{A, q) since p and q are final. Let A = {Q, S, 5, F). In fact, 
we will compute in polynomial time the equivalence relation 

iso ={{p,q)€FxF \ {L{A,p); <pref) = {L{A, q); <pref)}. 

This will be done similarly to the classical partition refinement algorithm for checking 
bisimilarity of finite state systems [15]. 

For p G F and C C let L{A,p, C) be the set of all words accepted by the DFA 
{Q, E, S,p, C). Hence, the sets L{A,p, {q}) (q G F) partition L{A,p). Let us say that 
a node u E L{A,p) is of type g if u € L(A,p, {q})- For p £ F and C C let us 
define the subset K{A,p, C) C L{A,p, C) as the set of all words over E labeling a 
path from p to a state from C without intermediate final states; this is clearly a regular 
language and a DFA for K{A,p, C) can be easily computed in polynomial time from 
A, p, and C: We take the DFA A and remove every transition leaving a final state from 
F. Moreover, we introduce a copy p' of p, which will be the new initial state and there 
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is an o-labeled transition from p' to q if and only if there is an a-labeled transition from 
p to g in A. Finally, C is the set of final states. 

Note that if u £ L(y^,p) is of type q, then the nodes uv with v G K{A^ q, F) are 
exactly the children of u in the tree {L{A,p); <pref)- Let n{p, g) G N U {oc} be the 
cardinality of the language K{p, {q}). By Lemma 2.2, each of these numbers n{p, q) 
can be computed in polynomial time. For C C _F let n{p, C) = X^^gi? n{p, <})■ Thus 
n{p, C) is the cardinality of the language K{p, C). 

Let us now compute the equivalence relation iso. As already remarked, this will be 
done by a partition refinement algorithm. Assume that R is an equivalence relation on 
F. We define the new equivalence relation i? on as follows: 

= {(P) € i? I n(p, C) = n{q, C) for every equivalence class C of R}. 

Thus, is a refinement of R which can be computed in polynomial time from R. Let us 
define a sequence of equivalence relations _Ro, ^i, • • • on as follows: Rq = F x F, 
Ri+i — Ri - Then, there exists fc < |i^| such that Rk = Rk+i- We claim that Rk ~ iso. 
A simple argument shows that for every equivalence relation Ron F with iso C R, one 
has iso C _R as well. Hence, by induction over i > 0, one gets iso C R^ for all « > 0. 

For the other direction, we show that if R is an equivalence relation on F such that 
R = R (this holds for R).), then R C iso. So, assume that {pi,P2) € R = R. We 
will define an isomorphism / : {L{A,pi); <pref) ^ {L{A,P2)', <pref) as the limit of 
isomorphisms /„, n > 1. Here, /„ is an isomorphism between the trees that result 
from (L(^,pi); <p|.ef) and {L{A,p2);<pref) by cutting off all nodes below level n 
(the roots are one level 1). Let us call these trees {L{A,Pi); <pref)fn (* G {li2}). 
Moreover, /„ has the additional property that if /„ maps a node ui of type qi to a node 
U2 of type (72, then we will have (qi, (72) G R- Assume that /„ is already constructed 
and let ui of type qi be a leaf of {L{A,pi); <pref) \n- Let U2 = /(wi) be of type 52; 
it is a leaf of {L{A,p2);<pref)\n- Then we have (51,52) & R = R and hence for 
every equivalence class C of i? we have n{q\,C) = n{q2,C). We can therefore find a 
bijection g between the languages K{qi, F) and K{q2, F) such that {u, g{u)) G R for 
all u G K{qi,F). Note that the nodes UiV with v G K{qi, F) are the children of Ui in 
the tree {L{A,pi); <pref)- We now extend the isomorphism /„ by g and do this for all 
leaves ui of {L{A,pi); <pref) \n- This gives us the isomorphism □ 

Corollary 3.2. The following problem belongs to EXPTIME/ 

INPUT: Two NFAs Ai and A2 such that e G L{Ai) n L{A2). 
QUESTION: (L(^i); <p,ef) ^ (1.(^2); <pref)? 

Proof. In exponential time, we can transform Ai and A2 into DFAs using the powerset 
construction. Then we can apply Theorem 3.1. □ 

Theorem 3.3. The following problem belongs to PSPACE.- 

INPUT: Two acyclic NFAs Ai and A2 such that e&L{Ai)f^ L{A2). 
QUESTION: {L{Ai); <pref) ^ (^(^2); <pref)? 
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Proof. By [21], isomorphism for finite trees, given explicitly by adjacency lists, can 
be decided in deterministic logspace. Hence, by Lemma 2.1 it suffices to show that for 
a given acycUc NFA, the adjacency list representation for the tree {L[A); <pref) can 
be computed by a PSPACE-transducer. This is straightforward. Assume that S is the 
alphabet of A and that n is the number of states of A. Let us fix an arbitrary order on E 
and let z be the largest symbol in S. 

The language L{A) only contains words of length at most n — 1. In an outer loop 
we generate the language L{A). For this, we enumerate all words (e.g. in lexicographic 
order) of length at most n — 1 and test whether the current word is accepted by A. For 
each enumerated word u e L{A), we have to output a list of all children of u in the tree 
{L{A); <pref)- In an inner loop, we enumerate (again in lexicographic order) all words 
uv {v e X'+) of length at most n — 1 and check whether uv G L{A). In case, we find 
such a word uv G L{A), we output uv and do the following: lfv€ {z}^, then the 
inner loop terminates. On the other hand, if v — v'az^, where a ^ z, then we jump in 
the irmer loop to the word uv'h, where h is the symbol following a in our order. □ 

3.2 Lower bounds 

The main result of this section states that the isomorphism problem for regular trees 
that are represented by NFAs is EXPTIME-hard, which matches the upper bound from 
the previous section. It is straightforward to prove PSPACE-hardness. If E is the under- 
lying alphabet of a given NFA A, then {L[A); <pref) is a full |I7|-ary tree if and only 
if L{A) = E*. But universality for NFAs is PSPACE-complete [28]. The proof for 
the EXPTIME lower bound is more involved. Here is a rough outline: EXPTIME coin- 
cides with alternating polynomial space [5] . Checking whether a given input is accepted 
by a polynomial space bounded alternating Turing machine M amounts to evaluate a 
Boolean expression whose gates correspond to configurations of M. Using a construc- 
tion from [14], the evaluation problem for (finite) Boolean expressions can be reduced 
to the isomorphism problem for (finite) trees. In our case, the Boolean expression will 
be infinite. Nevertheless, the infinite Boolean expressions we have to deal with can be 
evaluated because on every infinite path that starts in the root (the output gate) there wiU 
be either an and-gate, where one of the inputs is a false-gate, or an or-gate, where one 
of the inputs is a true-gate. Applying the construction from [14] to an infinite Boolean 
expression (that arises from our construction) will yield two infinite trees, which are 
isomorphic if and only if our Boolean expression evaluates to true. Luckily, these two 
trees turn out to be regular, and they can be represented by small NFAs. 

Infinite Boolean formulas. Let us fix the alphabet 

/? = {a,£A,4,rA,^,^'v,^v}. (1) 

In the following, we will only consider prefix-closed trees over the alphabet J7 (we will 
not mention this explicitly all the time). Moreover, we will identify the tree (L; <pref ) 
with the language L. Now, consider such a tree T <Z [}* . Then, T is well-formed, if the 
following conditions hold: 
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(a) If u = e or u e T ends with ly, r^, or Ta, then child(M, T) is one of the 
following sets, where o g {V,A}: {u£o,uro}, {u£'o,uro}, {ua,u£'o,uro}. 

(b) If u e T ends with a, i'y, or £'/^, then u is a leaf of T. 

(c) For every infinite path P in T, there exists u € P with ua € T. 

Note that a well-formed tree T is always infinite; it contains an infinite path of the form 
rir2rs ■ ■ ■ , where rj € {v/^, ry} for all i >1. Let us define the set 

cut(r) = {u e T I ua e T, V?; <pref m : wa ^ T}. (2) 

Hence, on every infinite path in T there is a unique node from cut(T). 

With a well-formed tree T we associate an infinite Boolean expression bool(r) as 
follows: The gates of bool(T) are the nodes of T that do not end with a. 

- The set of input gates for u s T is child(u, T) \ {ua}. 

- If ury G T (resp. 7irA £ T), then ?i is an or-gate (resp. and-gate). 

- If u£\ e T and ua ^ T, then u£'^ is a true-gate. 

- If u£'^ G T and ua G T, then u^'a is a false-gate. 

- If u£'y G T and wa ^ T, then w£y is a false-gate. 

- If u£y G T and ua G T, then is a true-gate. 

Although bool(T) is an infinite Boolean formula, the fact that T is well-formed ensures 
that the root of bool(T) can be evaluated: We simply remove from T all nodes that 
have a proper prefix from cut(r). The resulting tree has no infinite path and since it is 
finitely branching it is finite by Konig's lemma. If u G cut(T) is such that u£'^ G T 
(resp., u£'y G T), then u can be transformed into a false-gate (resp., true-gate). Then, 
one has to evaluate the resulting finite Boolean expression. 

We next transform a tree T C f]* into trees [T]i, [T]2 Q {£, r}* using two rational 
transducers. These two transducers only differ in their initial state. For i G {1,2}, let 
Ti be the transducer from Figure 1, where the initial state is Qi and all states are final. 
Then, for a tree T C n* and i G {1, 2} let [T]i = pref (7;(r)). We will show that for 
every well-formed tree T C i?*: bool(T) evaluates to true if and only if [T]i = [T]2. 
(Lemma 3.9) For this, we first have to show a few lemmas. 

Lemma 3.4. Let T = {e, £'y} U r^U or T = {e, £a} U t/^U for a tree U (hence, also 
T is a tree). Then [Tji ^ [T]2 if and only if[U]-i_ ^ [C/ja. 

Proof. We only prove the lemma for T = {e,£[y} yj ryU; the statement for T = 
{e, £a} U r/^U can be shown analogously. Let us compute compute 7i(T) and T^iT). 
We have 

Ti{£'y) = T2{£'y) = {£\re). (3) 

Next, we have to compute 7i {ryU). There are two transitions starting in gi, where ry 
can be read, namely 

qi > q2 and qi > qi. 

Hence, we get 

Ti {ry U) = r^£Ti{U)U £r£ 7i (C/) . (4) 
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Fig. 1. The transducer 




Fig. 2. [r]i (left) and [T]2 (right) from Lemma 3.4 



Similarly, we get 

r2{ryU)= r''lT2{U)\JMri{U). (5) 

From (3), (4), and (5) it follows that the trees [T]i = pref U r^jU)) (i e 

{ 1 , 2}) are the ones shown in Figure 2. The equivalence of [T] i = [T] 2 and [U] 1 = [?7] 2 
is obvious from these diagrams. □ 

The following three lemmas can be shown with the same kinds of arguments as for 
Lemma 3.4. We therefore only sketch the proofs. 

Lemma 3.5. Let T = {e,i'~^, a} U ryt/ for a tree U (hence, also T is a tree). Then 

[Th - [r]2. 

Proof. We have Tiia) = {l^} and 7^ (a) = {i^, ir}. It follows, that the trees [T]i and 
[T]2 are as shown in Figure 3. Clearly, we have [T] 1 = [T]2. □ 
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Fig. 3. [r]i (left) and [T]2 (right) from Lemma 3.5 





Fig. 4. [r]i (left) and [r]2 (right) from Lemma 3.6 



Lemma 3.6. Let T = {e, t'^, a] U r/JJ for a tree U (hence, also T is a tree). Then 
[T]i ^ [T]2. 

Proof. The trees [T]i and [T]2 are shown in Figure 4. Clearly, we have [T]i ^ \T]2. 

□ 

Lemma 3.7. Le? T = {e} U t\jU U ryl^/or ■well-formed trees U, V (hence, also T is 
well-formed). Then [T]i ^ [T]2 if and only if{[U]i ^ [U]2 or [V]i ^ [V]2). 

Proof. The trees [T]i and [T]2 are shown in Figure 5. Since J7 and F are well-formed, 

in each of the trees [U]i, [U]2, [V]i, and [V]2, the root has two children. It follows 
easily that [r]i ^ [rja if and only if ([U]i ^ [U]2 or [V]i ^ [FJa). □ 

Lemma 3.8. Let T = {e} U £^U U r/^V for well-formed trees U, V (hence, also T is 
well-formed). Then [T]i = [T]2 if and only if {[U]i = [U]2 and [V]i = [V]2). 

Proof. The trees [T]i and [T]2 are as shown in Figure 6. Since t/ and V are well- 
formed, in each of the trees [U]i, [U]2, [V]i, and [V]2, the root has two children. It 
follows easily that [T]i ^ [T]2 if and only if i[U]i ^ [U]2 and ^ [V]2). □ 

Lemma 3.9. For every well-formed tree T C f2*, we have: bool(r) evaluates to true 
if and only if [T]i ^ [T]2. 
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Fig. 5. [r]i (left) and [T]2 (right) from Lemma 3.7 




Fig. 6. [r]i (left) and [T]2 (right) from Lemma 3.8 



Proof. Recall the definition of the set cut(r) from (2). From the definition it follows 
that pref (cut(T)) is a finitely branching tree without infinite paths. Hence, by Konig's 
lemma it is finite. Moreover, for every u € pref (cut(T)), the subtree T|~„ is well- 
formed as well (since pref (cut(T)) C {e} U fl*{ty,tf^,ry,rf^}). Inductively over the 
height of M S pref(cut(T)) in the finite tree pref (cut(T)), we will prove for every 
u G pref(cut(r)): [T\u]i = [ry2 if and only if bool(rr„) evaluates to true. 

For the induction base, let u G cut(r) be a leaf of pref (cut(T)). Hence, we have 

ua G T. If u£'^ G T, then in bool(r|"„), the root is an and-gate for which one of 
the inputs (namely u£\) is a false-gate. Hence, bool(Tf„) evaluates to false. Moreover, 
Lemma 3.6 imphes that [T\y]i ^ [r|'„]2. On the other hand, if u£[y G T, then in 
boo\{T\u), the root is an or-gate for which one of the inputs (namely u£[,) is a true- 
gate. Hence, bool(T|"„) evaluates to true. Moreover, Lemma 3.5 implies that [rt„]i = 
[T luh- This concludes the induction base. 

Next, let u G pref (cut(T)) be a proper prefix of a node from cut(T). In particular 
u ^ cut(T). We can distinguish 4 different cases: 
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Case 1. child(u,r) = {ui/„ur/\}. We must have {u£/\,ur/\} C pref(cut(T)). Hence, 
the induction hypothesis (IH) holds for ui/^ and ur/^. We get: 

bool (T \u) evaluates to true <s=^ bool (T ) evaluates to true and 

bool(r|'„rA) evaluates to true 

<^ [ruji^[ruj2aiid 

Case 2. child(u, T) = {ui\/, ury}. This case is analogous to Case 1, using Lemma 3.7. 

Case 3. child(u,T) = {u£'^,ur/\}. Since u ^ cut(T), we have ua ^ T. We must 
have G pref (cut(T)). Moreover, in bool(Tf„), the root is an and-gate, where one 
of the inputs is a true-gate and the other input is the root for the Boolean expression 
boo\{T\urf,)- Hence, we get: 

bool(T['„) evaluates to true hoo\{T\ur;.) evaluates to true 

[T\uTr.]l — [T\ur^\2 

Le mma 3.4 rr^f i \rr\. ^ 

[J lujl — l-i \u\2 

Case 4. child(w, T) = {ui'y, uvy}. This case is analogous to Case 3. □ 

Our last auxiliary lennma states that an NFA for the tree [L]i can be easily computed 
from an NFA for L. 

Lemma 3.10. There is a logspace machine that computes from a given prefix-closed 
NFA A with terminal alphabet f2 a prefix-closed NFA B such that L(B) = [L{A)\i for 

ie{i,2}. 

Proof Let A = {Q, ^2, 6, po,Q). Recall that all states of % and A are final. The prefix- 
closed NFA B is obtained from the direct product of A and % by adding further states 
so that every transition is labeled with a single symbol. Thus, the set of states of B 

contains Q x {gi, 52, s} and the initial state of B is (po, Qi)- If g q' in ^ and t -^^^ t' 
in 71 for w e {£, r}+, then we add \w\ — 1 many new states to B, which buUt up a 
w-labeled path from from (g, t) to {q' , t'). □ 



EXPTIME-hardness. We are now in the position to prove the main result of this 
section. 

Theorem 3.11. The following problem is EXPT\ME-hard (and hence EXPTIME-com- 

plete): 

INPUT: Two prefix-closed NFAs Ai and A2. 
QUESTION: {L{Ai); <pref) ^ {L{A2); <pref)? 
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Proof. The upper bound is stated in Corollary 3.2. For the lower bound we use the fact 
that EXPTIME equals the class of all sets that can be accepted in polynomial space on 
an alternating Turing machine [5]. Hence, let M be a polynomial space bounded alter- 
nating Turing machine such that the accepted language L{M) C {0, 1}* is EXPTIME- 
complete. We can assume that M has no infinite computation paths. By padding in- 
puts, we can moreover assume that M works in space n for an input of length n. Let 
Q = Qa U Qv be the set of states of M and let T D {0, 1} be the tape alphabet. W.l.o.g. 
we can assume that in every computation step, M moves from an existential state to a 
universal state or vice versa, and that the initial state qa is universal. 

Let us now fix an input ui G {0, 1}* oflengthn. We will construct two prefix-closed 
NFAs Ai and A2 such that w G L{M) if and only if {L{Ai); <pref ) = (^(^2); <pref)- 
Let = r IJ Q. As usual, a configuration of M can be represented by a string from 
the language 6*"+^ (more precisely, from [J"Zo r^QF^^^^). A word u e 0* is a valid 
computation of M on input w if u is of the form ci - ■ - Cm for some m > such that 
the following holds: 

- c,e UjCo r^Qr""-^ for aU 1 < i < m 

- Ci Km Ci+i (i.e., q+i is a successor configuration of Cj) for all 1 < i < rn — 1 

- qow \-M ci 

Note that £ is a valid computation in this sense. It is well known that from w one can 
construct in logspace a coaccessible NFA Aw such that Aw accepts all words over 
that are not a valid computation of M on w [28]. 

Next, we will define a regular well-formed tree C [2* (depending only on w) 
such that bool(Tu,) evaluates to true if and only if w G L(M). In the following, we 
identify the symbols in with the integers O,...,!©] — linan arbitrary way. We can 
assume that \0\ > 2. We define two morphisms 



For i >\, let ipi be the mapping (resp. <^v) if i is odd (resp., even). Similarly, for 
X G r}, let Xi be X/^ (resp. Xy) if i is odd (resp., even). Then, the tree Tw C Q* 
is pref(T4), where 



Clearly, Tw is regular, and a prefix-closed NFA for can be computed in logspace 
from w (using the logspace computable coaccessible NFA Aw)- 



ip^:0* ^ {^A,r-A} 



as follows (o e {A, V}): 




T' = 




17 



Claim 1: is well-formed. 

Proof of Claim 1: The first three conditions for weU-fonned trees are easy to check. For 
the last condition, we have to consider an arbitrary infinite path P of and show that 
there exists u € such that ua € T. But this means that u is of the form 

m 
i=l 

with m > 0, ci, . . . , Cm G 6*"+^, and ci ■ • ■ G L{Aw)- The latter condition means 
that ci • • • c„i is not a valid computation of M on input w. Claim 1 now follows from 
the fact that for every infinite sequence C1C2C3 • • • with q £ 6*"+^ for i > 1 there exists 
m > 1 such that ci • ■ • Cm is not a valid computation of M on input w (since M does 
not have infinite computation paths). 

Claim 2: w <E L(M) if and only if bool(2^u,) evaluates to true. 

Proof of Claim 2: Let us consider the finite tree pref (cut(T^)). For every node 

9 = r-A</3A(ci)rv</9v(c2)r-A • ■ ■ Vm-i{cm-i)rm^m{cm) G pref (cut(T^)) 

with m > and ci, . . . , Cm G 0"+^ we will prove (by induction on the height of g) the 
following: If ci • • ■ c,„ is a valid computation of M on input w, then is an accepting 
configuration if and only if g evaluates to true in bool(T^). Here, for w = 0, we define 
Co as the initial configuration qow. 

So, assume that g G pref (cut(T^)) is of the above form and that ci • • • c„i is a valid 
computation of M on input w. W.l.o.g. assume that m is odd (the case that m is even 
can be dealt analogously). Thus, 

g = rA(/3A(ci)rv(/3v(c2)rA • ■ • <Pv(cm-l)^'A<PA(cm)• 
Then, in bool(T^), the input gates for the or-gate g are g£'y and gr^^. Since ci • • • c^ 
is a vaUd computation of M on input w, ga does not belong to the tree T^. Hence, 
in bool(T^), g£'^ is a false-gate. Thus, g evaluates to true if and only if grv eval- 
uates to true. From the structure of we see that the latter holds if and only if 
there exists Cm+i € 6*"+^ such that grsyipy{cm+i) evaluates to true. First assume 
that Cm+i is such that ci • • • c„iCm+i is not a valid computation. The inputs for the and- 
gate grx/(fi\/{cm+i) are grx/(fi\/{cm+i)i\ and grvi^v(cm+i)?'A- Since 
not a valid computation, gryipy{cm+i)a belongs to the tree T^. Thus, in bool(T^), 
grsj(fy{c„i+i)(\ is a false-gate and gry(fy{cm+i) evaluates to false. This holds for all 
Cm+i such that ci • • • CmCm+i is not a valid computation. Hence, gry evaluates to true 
if and only if there exists a configuration Cm+i e 6>"+i such that 
a valid computation (which means that c„i+i is a successor configuration of Cm) and 
gryipy(cm+i) evaluates to true in bool(Tu,). Now, if ci • • -CmCm+i is a vahd com- 
putation, then by induction, gry(f\/{cm+i) (which belongs to pref(cut(T^)) as well) 
evaluates to true in bool(T^) if and only if c„i+i is an accepting configuration of M. 

We have shown that g evaluates to true if and only if Cm has an accepting successor 
configuration. Finally, since m is odd, Cm is an existential configuration (recall that the 
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initial configuration cq = qow is universal). Thus, indeed, g evaluates to true if and 
only if Cm is accepting. This proves Claim 2. 

Let 7i and T2 be the rational transducers from Section 3.2. Using Lemma 3. 10 we can 
compute in logspace from a prefix-closed NFA for Tyj two prefix-closed NFAs Ai and 
A2 such that L{Ai) = [Tu,]i for i e {1, 2}. By Lemma 3.9 and Claim 2, we have 

w e L{M) <^=^ bool(rt„) evaluates to true (^(A); <pref ) = (^(^2); <pref)- 

This concludes the proof of the EXPTI M E lower bound. □ 



PSPACE-hardness 

Theorem 3.12. The following problem is PSPACE-hard (and therefore PSPACE-com- 

plete): 

INPUT: Two prefix-closed acyclic NFAs Ai and A2. 
QUESTION: <p,ef) ^ (^(^2); <pref)? 

Proof. The upper bound is stated in Theorem 3.3. For the lower bound, we use the same 
idea as in the proof of Theorem 3 . 1 1 . In fact, we will use most of the notations from that 
proof; some of them will be shghtly modified. This time, we use the fact that PSPACE 
equals the class of all sets that can be accepted in polynomial time on an alternating 
Turing machine. Hence, let M be a polynomial time bounded alternating Turing ma- 
chine such that the accepted language L{M) C {0, 1}* is PSPACE-complete. Let p{n) 
(a polynomial) be the time bound and let q{n) = p{n) + 1. We can assume that q{n) is 
odd for all n > 0. W.I.o.g. we can assume again that M works in space n for an input 
of length n. Let w £ {0, 1}* be an input for M of length n. 

Let us add to the alphabet J? in (1) an additional symbol r'y . The notions from Sec- 
tion 3.2 have to be extended to this new alphabet f2. In condition (a) for the definition of 
a well-formed tree T, we also allow the set {ua, u£y, ur'^} for child(?i, T). Moreover, 
every node ur[y G T is a leaf of T. The new definition for the set cut(r) can be over- 
taken from (2). Also the Boolean expression bool(T) can be defined as in Section 3.2; 
the truth value of a leaf ending with is set arbitrarily (say true). Finally, let us extend 
the two transducers 7i and % such that, from qi and 52 they can read the new symbol 
r',_y and output £ and then terminate in a sink state .s. 

We now define the well-formed tree Uw C i7* as Uw = pref (t/^), where: 

^- = {(n^'^'('='))C+i |O<m<9(n),ci,...,c„e0"+i| U 

I ^fl?'i(/?i(ci)^a I < m < q{n),ci,.. .,Cm& 6'"+\ci • ■ - c™ e U 



Note that Uyj is finite. An acyclic prefix-closed NFA for Uy, can be produced in logspace 
from w. Moreover, since every word from is not a valid computation (since 
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M terminates after < p{n) — q{n) — 1 steps), the Boolean expression boo\{Uw) and 
bool(Tu,) (where was defined in the proof of Theorem 3.11) evaluate to the same 
truth value . Hence, using Claim 2 from the proof of Theorem 3 . 1 1 , it follows that w € 
L{M) if and only if boo\{Uw) evaluates to true. Using an analogon of Lemma 3.9, 
this holds if and only if [Uw]i — [U^u]2- Acyclic NFAs for [Uw]i and [Uwh can be 
easily constructed in logspace from w (using an acyclic NFA for [/^). This concludes 
the proof of the theorem. □ 



P-hardness 

Theorem 3.13. The following problem is P-hard (and hence P-complete): 

INPUT: Two prefix-closed acyclic DFAs Ai and Ai. 
QUESTION: <p,ef) ^ {L{A2); <pref)? 

Proof. The upper bound is stated in Theorem 3.1. For the lower bound, we reduce 
the P-complete monotone circuit value problem [12] to the problem from the theorem. 
Note that the tree {L{A) \ <pref), where ^ is a prefix-closed acyclic DFA, is just the 
unfolding of the underlying dag (directed acyclic graph) in the initial of A. Vice versa, 
from a dag D with a root node r one can construct a prefix-closed acyclic DFA A such 
that {L{A) \ <pref ) is isomorphic to the unfolding of D in r (let us denote the latter tree 
by unfold(I?, r)). One only has to associate labels to the edges of D. Hence, it suffices 
to construct from a given monotone circuit C a dag D which contains for every gate 
g of C two nodes gi,g2 such that g evaluates to true if and only if unfold(£',sfi) = 
unfold 52)- This is straightforward for the input gates of C. For and- and or-gates of 
C, we can use again the construction of [14|. Take the constructions from Figure 5 and 
6, where in Figure 5 each of the subtrees \U]i, \U]2, \y]i, and \y]2 is represented only 
once. The construction for or-gates is shown in Figure 7. Assume that the dag D below 
the nodes m, U2, vi, and V2 is already constructed. Here ui and correspond to a gate 
u and vi and V2 correspond to a gate v. Hence, u (resp., v) evaluates to true if and only 
if unfold(Z), ui) = unfold(D, -^2) (resp., unfold(D,ui) = unfold(D, ^2)). Let t be an 
or-gate with inputs u and v. We add the nodes and edges as shown in Figure 7. Then the 
arguments from the proof of Lemma 3.7 show that uovv evaluates to true if and only 
if unfold(£), ^i) ^ unfold(£», 12). □ 



4 Isomorphism problem for regular words 

In this section we study the isomorphism problem for regular words that are represented 
by partitioned DFAs. We prove that this problem as well as the isomorphism problem 
for regular linear orders that are represented by DFAs are P-complete. It follows that 
the isomorphism problem for regular linear orders that are represented by NFAs can be 
solved in exponential time. We show that this problem is PSPACE-hard. For the case 
of acyclic DFAs and NFAs, respectively, we obtain completeness results for counting 
classes (C= L-completeness for acycUc DFAs and C= P-completeness for acyclic NFAs). 
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Fig. 7. The or-construction in the proof of Theorem 3.13 

4.1 Upper bounds 

The main result of this section is: 

Theorem 4.1. The following problem can be solved in polynomial time: 

INPUT: Two partitioned DFAs Ai and A2. 
QUESTION: w{Ai) ^ w{A2)? 

In Section 4.2-4.6 we prove Theorem 4.15. Section 4.2 will introduce some of the ma- 
chinery from [2] conceming blocks. Blocks allow to condensate a generalized word to 
a coarser word (whose elements are the blocks of the original word). In Section 4.3 we 
will formally introduce succinct regular expressions (expressions in form of dags) and 
in Section 4.4 we will argue that Heilbrunner's algorithm from [13] allows to trans- 
form a given partitioned DFA in polynomial time into an equivalent succinct (regular) 
expression. Hence, the remaining goal is to develop a polynomial time algorithm for 
checking whether two given succinct expressions represent isomorphic regular words. 
For the special case that these regular words consist of only one block (so called primi- 
tive regular words), this will be accomplished in Section 4.5. In this step, we will make 
use of algorithms for straight-line programs (succinctly represented finite words) [27]. 
Finally, in Section 4.6 we will present a polynomial time algorithm or checking whether 
two given succinct expressions represent isomorphic regular words. 

4.2 Blocks and their combinatorics 

In this section, we will introduce the crucial notion of a block, and we recall some of 
the results from [2] that we are using later. 

Let u = {L;<, r) be a generalized word. An interval of u is an interval of the under- 
lying hnear order (L; <). A subword of u is an interval / of u together with the coloring 
T restricted to I. Let r" C 17 be finite. A F-uniform subword of w is a subword that is 
isomorphic to I^''. A subword is uniform if it is F-uniform for some F C S. A uniform 
subword is a maximal uniform subword if it is not properly contained in another uni- 
form subword. Now let w be a subword such that no point of v is contained in a uniform 



21 



subword of u. Then v is successor-closed if for each point p of v, whenever the succes- 
sor and the predecessor of p exist, they are contained in t; as well. A successor-closed 
subword is minimal if it does not strictly contain another successor-closed subword. 
Following [2] we define: 

Definition 4.2 (blocks). Let u be a regular word. A block of u is either a maximal 

uniform subword of u or a minimal successor-closed subword ofu. 

A regular word which consists of a single block is called primitive} By [2] a word u is 
primitive if and only if it is of one of the following forms (where x,z & , y S S*): 
A finite non-empty word, a scattered word of the form x'^y, a scattered word of the form 
yz'^, a scattered word of the form x^yz'^, or a uniform word (F"^ for some F C S). 
Let D{S) be the set of all primitive words over S. 

Let u be a regular word. Each point p of w belongs to some unique block Bl(p), 
which induces a regular (and hence primitive) word. Moreover we can order the blocks 
of u linearly by setting Bl{p) < Bl{q) if and only if p < q. The order obtained that 
way is denoted (Bl(u); <). Then we extend the order (B1(m); <) to a generalized word 
u over D{S) (here it is useful to allow infinite alphabets, since D{S) is infinite), called 
the skeleton of u, by labeling each block with the corresponding isomorphic word in 
D{S). Implicitly, it is shown in [2] that for every regular word u there exists a finite 
subset of D{S) such that every block of u is isomorphic to a primitive word from that 
finite subset. Moreover, u is again a regular word. Later it will be convenient to have 
the following renaming notion available. Let V hea finite alphabet, letcp-.V-^ 
be an injective mapping and suppose that all blocks of a regular word u belong to the 
image of ip. The word v that has (B1(m); <) as underlying order and each block B of 
u labeled with ip~^{B) is called the iy?-skeleton of u. We will need the following result 
from [2]: 

Proposition 4.3 (see [2, Corollary 73]). Let u,v G Reg(i7). Let V be a finite alphabet 
andletif : V — > F>{S) be injective such that all blocks of u and v are in the image of Lp. 
Then u and v are isomorphic if and only if the ip-skeletons ofu and v are isomorphic. 

We will consider finite and infinite sequences, whose symbols are regular words and 
where the underlying order type is either finite, oj or uJ. In the following, when writing 
{uiji^i, we assume that either / = {1, . . . , n} 7^ (i.e., {ui)i(zi is the finite sequence 
{ui, ... ,Un)) or I = {1,2,3,...} (i.e., {ui)i(^i is the infinite sequence (ui, W2, W2, . . .)) 
or 7 = {...,—2,-1,0} (i.e., {ui)i^i is the infinite sequence (. . . , ■u_2, w-i, mo))- 
The corresponding generalized word is Hie/ (either ui - ■ ■ Un, or U1U2U3 - or 
• • • U-2U-1U0). We say that two sequences {ui)i^i and {vj)j^j are equivalent, if the 
generalized words Hie/ '^i Hiej isomorphic. We use commas to separate 
the successive Ui in the sequence (ui)i^i in order to avoid misinterpretations. For in- 
stance (a, a) viewed as a sequence over regular words has length two whereas (aa) has 
length 1. Of course, (a, a) and (aa) are equivalent sequences. 

* In combinatorics on words, a finite word is called primitive, if it is not a proper power of a 
non-empty word. Our notion of a primitive word should not be confused with this definition. 
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Definition 4.4. Let u = {ui)i^i be a sequence of regular words. We say that u does not 
merge if the set of blocks o/Hie/ ^» union of the set of blocks of the Ui. If this is 
not the case, then we say that u merges. 

In other words, u merges if there exists a block that contains elements from two different 
Ui. In [2, Corollary 32] it is shown that a sequence u merges, if and only if there exists 
a factor (uj, Uj+i) or (uj, Uj+i, Ui+2) that merges. 

Example 4.5. Clearly if u and v are finite words, then (u, v) merges. Also, [F^ ,r^) 
and {F'^, a, F^) merge for every F C E and a G F {in both cases, the sequence is 
equivalent to F^). On the other hand, {[ab]^, [ab]'^) does not merge. The reason is that 
the blocks of [ab]"^ are the copies of ab. More generally, if u is not primitive and X is a 
finite subset of regular words, then {{X U {u})'^, {X U {u})'^) does not merge. 

For the case of a sequence of primitive words, a complete description of merging se- 
quences was given in [2]. Moreover, if a sequence of primitive words merges, then it 
can be simplified to a non-merging sequence of primitive words. To make this more 
precise, let u, v, w be primitive words. If {u, v) merges, then by [2, Lemma 24] either u 
and V are r'-uniform for some F C or m is right-closed and v is left-closed. Then, the 
regular word uv has a single block. If {u, v, w) merges, then by [2, Lemma 24] either 
{u, v) merges, or (v, w) merges, or u, w arc /^-uniform and v is a singleton from F. 
This motivates the definition of the following rewriting system R over finite sequences 
over D{I]). 

Definition 4.6 (rewriting system R). The rewriting system R over the set D{S) con- 
sists of the following rules: 

- (ui,U2, U3) ^ u if U\ = Us = u = F^ for some F C E and U2 € F 

- {ui,U2) u if one of the following holds: 

• ui is right-closed, U2 is left-closed and u = U1U2 

• ui = U2 = u = F^ for some F C S. 

In the following, we will use some basic facts from rewriting theory, see e.g. [4] for 
further details. For sequences x and y over Reg(il'), we write x -^r y if there exist 
a rewrite rule u — > w and an occurrence of the sequence uin x such that replacing 
that occurrence by u gives the sequence y. Here, x and y may be infinite sequences. 
Moreover, those Xi of x = {xi)i^i that are not primitive are left untouched in the 
rewrite step x -^r y. Clearly, x -^r y implies that the sequences x and y are equiv- 
alent. A (possibly infinite) sequence u is irreducible w.r.t. R if there does not exist 
a sequence v with u -^r v. Clearly, on infinite sequences, R caimot be terminating 
(e.g., (a'', a^, . . .) -^r, (a'', a^, . . .) i& a loop). On the other hand, R is trivially 
terminating on finite sequences, since it is length-reducing. Moreover, by analyzing 
overlapping left-hand sides of R, one can easily show: 

Lemma 4.7. The rewriting system R is strongly confluent ( on finite and infinite se- 
quences), i.e., for all u, v, w such that u -^r v and u -^r w there exists x such that 
(v = X orv — >i{ x) and(w = xorw -^r x). 
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By a simple fact from rewriting theory, it follows that R is also confluent, i.e., for 
all u, V, w such that u — v and u — >^ w there exists x such that v — x and 
u) — X. Termination (on finite sequences) and confluence imply that R produces 
unique normal forms for finite sequences, i.e., for every finite sequence u there exists a 
unique finite sequence v such that u — >|j v and v is irreducible w.r.t. R. This v is called 
the irreducible normal form of u. 

The following is a direct consequence of [2, Lemma 24 & Corollary 32]. 

Lemma 4.8. Let ube a sequence of primitive words. Then u does not merge if and only 
ifu is irreducible w.r.t. R. 

We also have to verify that a sequence u over Reg(i7) containing non-primitive words 
does not merge. We use the definition below. Note that a regular word need not have 
a first or last block. For instance, (a")" has a first block but no last block, whereas 
(a")" (a")" and [aa]'' neither have a first block nor a last block. 

Definition 4.9 (good and semi-good sequences). The sequence u = {ui)i^i is good if 
the following conditions hold: 

(1) u is irreducible with respect to R. 

(2) For alii & I we have: 

(a) If Ui is not primitive and has a first block, then either (i — 1 £ I, m-i is 
uniform, and (iti-i, tii) does not merge) or (i ~ l,i — 2 G /, Uj-i and Ui-2 
are primitive, and {ui-2, Ui-i,Ui) does not merge). 

(b) Ifui is not primitive and has a last block, then either G /, Uj+i is uniform, 
and [ui, iti+i) does not merge) or(i + l,i + 2 G /, Uj+i andui+2 primitive, 
and {ui, Ui+i, Mj+a) does not merge). 

If only (2) holds, then u is said to be semi-good. 
Lemma 4.10. Ifu is good, then u does not merge. 

Proof Assume that u is good but merges. By [2, Corollary 32], one of the following 
cases holds: 

Case 1. u contains a factor (uj, Uj+i) that merges. If Uj and Uj+i would be both prim- 
itive, then u would be not irreducible, which is a contradiction (w is good). Hence, Uj 
or Ui+i must be not primitive. W.l.o.g. assume that Ui is not primitive (the other case 
is symmetric). If Ui has no last block, then [2, Corollary 30(1)] imphes that [ui, Wj+i) 
does not merge, which is a contradiction. Hence, we can assume that u, has a last block. 
But then, since u is good, (u,, Uj+i) does not merge, which is again a contradiction. 

Case 2. u contains a factor (ui, u^+i, Ui+2) that merges but neither [ui^Ui^i] nor 
(ui+i,Mj+2) merges. Since u is irreducible w.r.t. R, it follows that or Mj_|_2 

is not primitive. The case that is not primitive is symmetric to the case that Uj is 
not primitive. Hence, it suffices to consider the following two subcases: 

Case 2a. Ui is not primitive. If Ui has no last block, then [2, Corollary 31(1)] im- 
plies that (uj, Mj+i, Mj+2) does not merge, which is a contradiction. Hence, we can 
assume that u, has a last block, call it h^. Since u is good and (uj, Uj+i, Uj+2) merges. 
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must be uniform. If Ui+2 has no first block, then again [2, Corollary 31(1)] im- 
plies that {ui,Ui+i,Ui+2) does not merge, which is a contradiction. Let bi^2 be the 
first block of Ui+2- Moreover, [2, Corollary 31(2)] implies that Wj+i, 6^+2) merges. 
Since {ui,Ui+i) and (u^+i, Ui+2) do not merge, also {bi, Ui+i) and (uj+i, 6^+2) do not 
merge. It follows (from the form of our rewriting system R) that 6, = 6,+2 is uniform 
and is a singleton word. But we have already shown that Uj+i is uniform, which 
is a contradiction. 

Case 2b. Uj+i is not primitive. Then Ui+i has more than one block and [2, Corol- 
lary 31(1)] directly implies that (uj, Uj+2) does not merge, which is again a con- 
tradiction. □ 

Lemma 4.11. Ifu is semi-good and u -^r v, then v is semi-good as well. 

Proof. Assume that u = {ui)i^i is semi-good and u -^r v. We have to show that 
V = {vj)j^j is semi-good. For this, consider an j e J such that Vj is not primitive. 
Since the system R does not introduce non-primitive words, Vj must have been already 
present in u. Let i € I he the position in u that corresponds to position j in v. Hence, 
Uj = Vj. By symmetry it suffices to show that condition (2a) from Definition 4.9 holds 
for j e J. The case that Ui = vj has no first block is clear. So, assume that Uj has a first 
block. Since u is semi-good, we can distinguish the following two cases. 

Case 1. i — 1 G J, Ui-i is uniform, and (wj-i , Ui) does not merge. From the form of the 
rewrite rules, it follows that vj-i = Uj-i. Hence, vj-i is uniform, and {vj-i,Vj) = 
{ui-i, Ui) does not merge. Thus, we have shown condition (2a) from Definition 4.9 for 
j- 

Case 2. « — 1, i — 2 G /, Ui-2, Ui-i are primitive, and (?ii-2. Ui) does not merge. 
We make a case distinction on the position, where the rewrite rule is applied. 

Case 2a. i — 3 € I and in the rewrite step u -^r v, (ui_3, Wj_2, Wj-i) is replaced by 
u G £'(17). Thus, Uj_3 = Ui-i = uisuniform. Hence, Wj-i = u is uniform. Moreover, 
{vj-i,Vj) = {ui-i,Ui) does not merge. 

Case 2b. i — 4 <= I and in the rewrite step u -^r v, (^^-4, Ui^s, Ui^2) is replaced by 
u G D{S). Thus, Ui-4 = Ui-2 = u is uniform, Vj-2 = u = Ui^2, and itj_i = vj-i. It 
follows that Wj_2 andfj_i are primitive, and that {vj-2,Vj-i,Vj) = (ui_2,Wj_i,Ui) 
does not merge. 

Case 2c. In the rewrite step u -^r v, (Mi-2:W'i-i) is replaced by u G D{E). Then, 
{ui-2, Ui-i) merges. But this contradicts the assumption that (uj_2, Wj) does not 
merge. 

Case 2d. i — 3 G / and in the rewrite step u -^r v, (wj_3,Wj_2) is replaced by 
u G D{E). If Mi_3 = Ui-2 = w is uniform, then Vj-2 = and = w,i_i are 
primitive and {vj-2, Vj-i,Vj) = (ui_2, Wj-i, Wi) does not merge. Finally, assume that 
Ui-3 is right-closed, Ui-2 is left-closed and Vj-2 = u = Ui-zUi-2. We have Vj-\ = 
Ui-i. Thus Vj-i and Vj^2 are primitive. It remains to show that (uj_2, "fj-i, w^) = 
{ui-zUi-2,Ui-\,Ui) does not merge. We know that Wj) does not merge (since 

(uj_2, Mj-i, Ui) does not merge). Assume that (ui_3Ui_2, Uj-i) merges. Then (since 
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Ui-iUi-2 is primitive and scattered and Ui-i is primitive) Ui-^Ui-2 must be right- 
closed and Mj-i must be left-closed. But then, Uj_2 ^ £ is right-closed as well and 
(ui_2, Ui-i) merges. This is a contradiction. Hence, (ui_3tti_2, Wi-i) does not merge. 
Let hi be the first block of Ui. If (Mi_3Mi_2, Mi) merges, then by [2, Corol- 
lary 31(2)], {ui-sUi-2, Ui-i,bi) merges. Since neither {ui^^Ui^2, Wj-i) nor {ui-i,hi) 
merges, Ui^^Ui_2 and 6j must be uniform. But we know that Ui-^Ui-2 is scattered, 
whichleads again to a contradiction. Thus, indeed (wj_3Uj_2, Ui-i,Ui) does not merge. 

If the rewrite rule is applied at a position different from those considered in Case 2a-2d, 
then (wj_2, Wj) = {ui-2,Ui-i,Ui). Since (ui_2, Wi-i, Mi) fulfills condition (2a) 
from Definition 4.9, so does {vj-2 , Vj-i, vj). This concludes the proof of the lemma. 

□ 

Lemma 4.1 1 implies that from a given finite semi-good sequence u we can compute an 
equivalent good sequence, by computing the (unique) irreducible normal form of u. 



4.3 Expressions and succinct expressions 

Regular words can be naturally described by expressions using the operations of con- 
catenation, w-power, oJ-power, and shuffle. Formally, the set T{V, S) of expressions 
over V and E is inductively defined as follows: 

(a) VUEC T(y, S) 

(b) Ifai,...,a„ €T{V,Z;)(n> 1), flien ai • ■ • a„ €T{V,S). 

(c) If a e T{V, S), then a" G T{V, S) and a" e TiV, E). 

(d) If ai , . . . , a„ G T{V, S) (n > 1), then [qi ,...,«„]" G T{V, U). 

A mapping f : V ^ Reg{IJ) will be extended homomorphically to a mapping / : 
T{V, E) Reg(r) inductively as follows, where a, ai, . . . , a„ G T{V, E): 

- f[a) = aior a & E 

- /(«! • • • a„) = /(ai) • • • /(a„) 

- f{a-_) = f{ar 

- /K) = f{ar 

- /([ai,...,a„p) = ([/(ai),...,/(a„)p 

For a G T{V, E) we define the size \a\ G N inductively as follows: 

- |a| = Ifora G FUi: 

- |ai • • • an\_= h |a„| 

- \a"\ = |a"| = |a| + l 

- |[q!i, . . . ,a„]''| = |ai| H h |Q!n| -M 

A succinct expression system (SES) is a tuple A = {V, E, rhs) such that: 

- V (the set of variables) and E (the terminal alphabet) are disjoint finite alphabets. 

- rhs (for right-hand side) is a mapping from V to T{V, E) such that the relation 
{{Y, X) G V X V \ Y occurs in rhs(X)} is acychc. The reflex ttansitive closure of 
this relation is called the hierarchical order of A and denoted by ^a- 
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The property for rhs ensures that there exists a unique mapping vaU : V — > Reg{S) 
such that valA(-'^^) = valA(rhs(X)) for all X e V. If A is clear from the context, we 
wiU simply write val(X). 

In the following a quadruple A = {V, U, rhs, S) where {V, S, rhs) is as above and 
S € V (i.e., an SES with a distinguished start variable S) we will be called a succinct 
expression. In this case let us set val(A) = valA(5'). A succinct expression may be also 
seen as a dag (directed acyclic graph), whose unfolding is an expression in the above 
sense. 

Example 4.12. Consider the succinct expression 

A = {{Xi,X2, X3, Xi, X5}, {a, 6}, rhs, Xi) 

with 

rhs(Xi) = [X2, X^Y^ rhs(X2) = rhs(X3) = 

rhs(X4) = X5XQ rhs(X5) = ab r\\s{X(,) = ha. 

We have val(A) = [abbaabba, abbaabbaabbaabba]'^. The corresponding dag looks as 
follows: 




Nodes labelled with o compute the concatenation of their successor nodes. In case the 
order of the successor nodes matters, we specify it by edge labels. 

For an SES A we define 

|A| = J2 |rhs(X)|. 
xev 

An SES A = (y, E, rhs) is in normal form if all right-hand sides are in {V U E)^ or of 
the form , F^, [Yi , . . . , YnY^ for some F, Yi , . . . , r„ G y U i:. For such an SES A, 
we define depthj^(X) and w77-depthj^(X) for X G F inductively as follows (below, 

we set depthA(a) = w?7-depthA(a) = for a G S): 

- If rhs(X) = Yi ■ ■ - Yn (n > l,Yi, . . . ,Yn & S UV), then 

dcpthA(X) = max(dcpthA (Yi ) , . . . , dcpth^ (!"„)) + 1, 
cjr]-dcpthj^(X) = max{Lorj-depthj^(Yi) , . . . , w77-depth^(l^)). 

- If rhs(X) = or rhs(X) = Y~, then 

dcpthA(x) = dcpth^Cr) + 1, 

w77-depthA(X) = ujr]-depth^{Y) + 1. 

- If rhs(X) = [Fi, . . . , y„]'', then 

depth4(X) = max(depthji^(Fi), . . . , depth4(F„)) + 1, 
u)r]-dept]i^{X) = max.{cor]-depth^{Yi) , . . . , ujri-depthj^{Yn)) + 1. 
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straight-line programs. A succinct expression, where all right-hand sides belong to 
{V U 17)+ is called a straight-line program (SLP) [25]. In this case, val(A) is a finite 
non-empty word. An SLP A can be viewed as a succinct representation of the word 
val(A). More precisely, the length of val(A) may be exponential in |A|. We will make 
heavy use of the fact that certain algorithmic problems on SLP-encoded finite words 
can be solved in polynomial time. More precisely, we use the following results: 

Remark 4. 13. There exist polynomial time algorithms for the following problems: 

(a) Given an SLP A, calculate |val(A)|. 

(b) Given an SLP A and a number fc G N (coded in binary) we can produce an SLP B 
of size |A| + O(logfc) such that va 1(B) = val(A)'=. 

(c) Given an SLP A and numbers i < j < |val(A)|, compute an SLP B with val(B) = 
val(A)[« : j]. Here w[i : j] = Ui . . . Uj for a finite word w = ai . . . a„. 

(d) Given SLPs A and B decide whether val(A) = val(B) [24]. 

(e) Given SLPs A and B decide whether val(A) is a factor of val(B) [11, 20, 22]. 

The proofs for (a), (b), and (c) are straightforward. 

2-level systems. A 2-level system is a tuple A = (Up, Lo, S, rhs) such that the follow- 
ing holds (/ \a denotes the restriction of a function / to the set A): 

- The tuple (Up, Lo, rhsfup) is an SES (w.l.o.g. in normal form) over the terminal 
alphabet Lo. 

- The tuple (Lo, E, rhs [lo) is an SES over the terminal alphabet S. 

The set Up (resp. Lo) is called the set of upper level variables (lower level variables) 
of A. Moreover, we set y = Up U Lo and call it the set of variables of A. The SES 
(Up, Lo, rhsfup) is called the upper part of K, briefly up(A), and the SES (Lo, Z", rhsfLo) 
is the /ower /?flrf o/ A, briefly, lo( A). The upper level evaluation mapping uvaU : Up — )■ 
Reg(Lo) of Ais defined as uvalA — valLip(A). The evaluation mapping vaU is defined by 
valA(^) = val|o(A)(valup(A)(X)) forX e UpandvalA(^) = val|o(A)(-'^) forX G Lo. 

4.4 Heilbrunner's algorithm 

Theorem 4.14. From a given partitioned DFA A, we can compute in polynomial time 
a succinct expression A such that w{A) = val(A). 

Proof. There is nothing new about the proof. We just have to follow Heilbrunner's 
algorithm carefully. Let A = (Q, r,S,qo, {Fa)aes) be a partitioned DFA and let 
F = U„£j; Fa. We can assume that every state in f is a dead end, i.e., does not have 
outgoing transitions. For this, take a new symbol $, as well as a copy q' together with 
the transition (q, $, q') for every final state q ^ F. We set F^ — {q' \ q Q Fa} and let 
$ be the smallest symbol in F U {$}. The resulting partitioned DFA produces the same 
generalized word as A. 

So, assume that every state in F is a dead end. W.l.o.g. we can also assume that 
A is coaccessible. The variables of the succinct expression A will be the states of A. 
Consider a state p & Q and let {p,ai,qi) (1 < i < fc) be all outgoing transitions for 
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p, where ai < a2 < ■ ■ ■ < Uk- Let us define out(p) = qiq2 ■ • - Qk- Next, consider 
the graph with node set Q and an edge from p G Q to q £ Q ii there is a transition 
from p to q. We partition this graph into its strongly cormected components (SCCs). 
An sec C is smaller than an SCC D if there exists a path from a state in C to a state 
in D; this defines a partial order on the set of SCCs. We eliminate all SCCs starting 
with the maximal ones. When eliminating an SCC C, we define rhsA(p) for each state 
p e C. If the SCC C is a singleton set {p} with p ^ Fa, then we set rhsA(p) = a. If 
the SCC C = {p} is a singleton set with p ^ F, then we set rhsA(p) — out(p). Note 
that out(p) ^ e, since p ^ F and A is coaccessible. Now, consider an SCC C of size 
|C| > 2. Then every word out(p) (p e C) contains at least one occurrence of a state 
from C. Hence out(p) can be factored as out(p) = UpXpVp, where Up and Vp do not 
contain occurrences of states from the SCC C (i.e., all states occurring in Up and Vp 
belong to larger SCCs), and Xp starts and ends with a state from C (xp might consist 
of a single state from C). Define functions £ : C ^ C and r : C — > C as follows: 
i{p) (resp. r(p)) is the first (resp. last) state of the word Xp. Then, for every p e C, the 
sequences p,£{p),i'^{p), . . . and p,r{p),r'^{p), . . . become periodic after at most |C| 
steps. We now define regular expressions £p and rp as follows: Let Po,Pi, ■ ■ ■ ,Pa and 
qo,qi,...,qche shortest sequences such that po = qo = p, Pi+i = l{Pi), qi+i = r{qi), 
and e{pa) G {po,pi, . . . ,Pa}, r{qa) G {qo, gi, ■ • ■ , qc}- Assume that i{pa) = Pb and 
i"{<lc) = ?d for < 6 < a, < < c. Then, we define 

ip = {UpQ ■ ■ ■ Mp!,-l)('"P6 ■ ■ ■ '"Pa) 5 

Next, let T be the set of all regular expressions of the form £syrt (s, t e C) such that 
some word out(p) (p G C) contains a factor syt, where the word y does not contain 
a state from C. Then we finally set rhsA(p) — £p[T]^rp for all p G C. This con- 
cludes the eUmination step for the SCC C. By [13], for every state p e Q v/e have 
w{Q,r,S,p,{Fa)aei:) =\'a\A{p). □ 

By Theorem 4. 14, it suffices to prove the following result in order to prove Theorem 4.1. 

Theorem 4.15. The following problem can be solved in polynomial time: 

INPUT: Two succinct expressions Ai and A2. 
QUESTION: val(Ai) ^ val(A2).? 

In the next section, we will prove this result for the special case that both val(Ai) and 
val(A2) are primitive. 

4.5 A polynomial time equivalence test for succinct primitive expressions 

By Theorem 4. 14, the remaining goal is to test in polynomial time, whether two succinct 
expressions represent isomorphic regular words. In a first step, we accomplish this for 
succinct expressions that represent primitive words. In the following, E will always 
refer to a finite alphabet. Let us first show that we can decide in polynomial time whether 
a succinct expression represents a primitive word. 
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Lemma 4.16. Given a succinct expression A, we can decide in polynomial time whether 
val{A) is a primitive word, and in case it is we can compute in polynomial time a 
representation, which has one of the following forms, where B, C, D are SLPs and 
r C E (here, we should allow also the empfy wor<i/orval(C)J; val(B), val(C)val(D)'^, 
val(B)^val(C), val(B)^val(C)val(D)", T". 

Proof. We proceed along the hierarchical order of A and compute for each variable A 
of A whether val(^) is of one of the following forms (w, w € IJ~^,v € S*, PCS, 
a,b e ry. v, vFv, vw'^, vFvw", n, an, nb, anb. Moreover, SLPs for the finite 
words u, V, and w can computed simultaneously. Observe that from rhs(A) and the 
information already computed we can easily obtain whether val(A) is of such a form 
and in this case of which form. The following identities have to be used for shuffles 
(F C S, n > 0, m > 1, a,ai, . . . ,an G r, and every Ui {1 < i < m) has one of the 
forms cF"^, nc, end with c,d&r) 

[ai, . . . ,a„,ui, . . . ,w„i]'' ~ n 

piipv ^ nan = [nY ^ {n)~ = [na)'^ = {any^ = n 
[any ^ an 
[naf ^ na 

All these identities can be deduced from the axioms for regular expressions in [2]. 
Now val(A) is primitive if and only if val(5') is of one of the following forms (u, w G 
E+,v € E*, r C Sy. V, vFv, vw'^, vFvw'^, n. □ 

For our polynomial time equivalence test for succinct expressions that represent primi- 
tive words, we need the following technical lenmia. 

Lemma 4.17. LetUi,Vi,Wi {i e {1,2}) be finite words such that \ui\ = \u2\ = \vi\ = 
\v2\ = \wi\ = \w2\ > 0. Then UiViWi = U2V2W2 if and only if one of the following 
conditions hold: 

- U2V2W2 is a factor of v\viw1. 

- uiViw\ is a factor of u'^V2W2- 

- Vl = Wl, U2 = V2, and U2W2 is a factor ofulwl. 

- Ul = Vl, V2 = W2, anduiwf isafactorofu2W2. 



Proof. The four conditions from the lemma are shown in Figure 8 and Figure 9. It is 
straightforward to show that any of these four situations implies ufviWi = U2V2W2. 
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For instance, if the left situation in Figure 8 occurs, then there exist words x, y, x', y' 
such that ui = xy, U2 = yx, wi = x'y', W2 = y'x' and V2W2 = yvix'. Hence, 

u^viWi = {xyYvi{x'y'Y = {yxYyvix'{y'x'y = U2V2W2W2 = u%V2W2. 

Let us now assume that u'^viwf = U2V2W2. We distinguish the following cases: 

Case 1. The occurrence of vi in overlaps the occurrence of V2 in U2V2W2. 

Then, either U2V2W2 is a factor of ufviwf (if V2 starts before Vi) or uiviw^ is a factor 
of u\v2w\ (if vi starts before V2), see Figure 8. 

Case 2. The occurrence of vi in uXviw^ does not overlap the occurrence of 112 in 

V^V2W'^. 

Case 2.1. The occurrence of uiviwi in ufviw'f overlaps the occurrence of V2 in 
U2V2W2. Then, one of the following two situations occurs: 
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In the first situation, we obtain vi = wi (since viwi is a factor of W2) and U2 = V2 
(since U2V2 is a factor of uf). Hence, we get the left situation shown in Figure 9, i.e., 
U2'W2 is a factor of wfwf . In the second situation, we obtain m = vi (since uivi is 
a factor of Uj) V2 = W2 (since V2W2 is a factor of wf). Hence, we get the right 
situation shown in Figure 9, i.e., uiwf is a factor of W2w|. 

Case 2.2. The occurrence of uiviwi in does not overlap the occurrence of 

V2 in U2V2W2. Then mviwi either occurs in ztj or ^2 . Hence, ui = Vi = Wi and 
similarly U2 = V2 = W2. But ufui = U2U2 implies that U2 is a factor of uf. Hence, 
the third condition from the lemma holds. □ 

Lemma 4.18. Given two succinct expressions Ai, A2 over S such that val(Ai) and 
val(A2) are primitive words, we can decide in polynomial time whether val(Ai) = 
val(A2). 

Proof. We have to distinguish the following cases: 

Case 1. val(Aj) {i G {1,2}) is finite. Then val(Ai) = val(A2) can be checked in 
polynomial time by Remark 4.13(d). 



31 



Case 2. val(Ai) is rruniform (i e {1, 2}). Then val(Ai) = val(A2) if and only if 
A = A which can be checked in polynomial time. 

Case 3. val(Aj) = Uivf (i € {1, 2}). By Lemma 4.16 we can produce SLPs for Ui and 
Vi (i € {l,2})fromAi and A2, respectively, in polynomial time. Let fc, = [upland = 
\vi\. Let gcm(i?i, ^2) denote the greatest common multiple of ii and £2- By replacing 
Vi by ^™*^(*=i''=2) gcm(^i,^2)/^t ^^^j. j-j^j^ compute an SLP in polynomial time 

by Remark 4.13(b)), we can assume that = 11121 > fci.fc2. Let £ = \vi\ = \v2\. 
W.l.o.g assume that ki < k2 and let k = k2 — kx < £. Then, we can replace ui and 
v\ by : k\ and v\[k + 1 : £]v-i[l : k], respectively (we can compute SLPs for 

these words in polynomial time by Remark 4.13(c)). Hence, we can also assume that 
\ui\ = \u2\. But then, uiv^ = U2V2 if and only if iti = U2 and vi = V2, which can be 
checked in polynomial time by Remark 4.13(d). 

Case 4. val(Ai) = ufvi {i G {1, 2}). This case can be dealt with analogously to Case 3. 

Case 5. val(A,) = ufviwf (i G {1,2}). By Lenmia 4.16 we can produce SLPs for 

Ui, Vi, and Wi in polynomial time. As in Case 3, by replacing the words Ui,Wi by 
appropriate powers, we can enforce the condition = \u2\ = \wi\ = \w2\ = £ > 
\vi\,\v2\- In addition, we can enforce the condition \vi\ = \v2\ = £ as follows: Let 
ki = \vi\ < £. Then we can replace Vi and wi by [1 : £ — ki] and Wi[£ — ki + 1 : 
£]wi[l : £ — ki], respectively. Now, that we have = \u2\ = \vi\ = \v2\ = \wi\ = 
\w2\, we can check UiViWi = U2V2W2 in polynomial time using Lenrnia 4.17 and 
Remark 4.13(e). □ 

4.6 A polynomial time equivalence test for succinct expressions 

In this section, we will finally prove Theorem 4.15. The general strategy is very sim- 
ilar to [2]. We will incrementally reduce the wry-depth of the two given succinct ex- 
pressions, until one of them (or both) describe primitive words. This allows to use the 
results from the previous section. We have to analyze carefully the size of the interme- 
diate succinct expressions. In the following, S will always refer to a finite alphabet. We 
will need certain nice properties of SESs. 

Definition 4.19 (primitive). A primitive SES is an SES A = {V, E, rhs) such that 
valA(^) is primitive for all X (z V. A 2-level system B is primitive if \o{M) is primitive. 

Definition 4.20 (ir redundant). An irredundant SES is an SES A = {V., S, rhs) such 
that valA(^) 7^ valA(i^) for all X,Y G V with X ^ Y. Again we say that a 2-level 
system B is irredundant if\o{^) is irredundant. 

One can think of a primitive and irredundant SES as a succinct representation of a finite 
subset of D{S) where vaU : V — ^ D{E) defines an injective mapping from V to 
this finite subset. Hence, for a regular word u such that all blocks belong to the image 
of vaU, we can define the vaU-skeleton of u. In the following, we will simply call it 
the A-skeleton of u. A primitive and irredundant 2-level system intuitively is a system, 
where the terminal alphabet is a finite subset of D{E) (namely the valuations of the 
variables of the lower part lo(B)). 
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Remark 4.21. If a primitive 2-level system B is not irredundant then, using Lemma 4.18, 
one can produce in polynomial time an irredundant 2-level system C such that va I (B) = 
val(C). Indeed, if there are two different variables X, F e Lo such that valB(-'^) = 
valA(y^), then one has to replace X in all right-hand sides by Y. Thereafter X can be 
removed from Lo. Note that this process does not change the set of upper level variables 
of B. 

Assume that B is an SES or 2-level system and let u = {Ai)ii=i be a (possibly infinite) 
sequence of variables of E. We say that u does not merge (is good, semi-good, irre- 
ducible), if the sequence (val(j4i))ig/ does not merge (is good, semi-good, irreducible). 
Moreover, two sequences u = (Ai)^^/ and v = {Bj)j^j of variables (possibly from 
two different SESs or 2-level systems) are equivalent if the sequences (val(Aj))jg/ and 
{\ja\{Bj))jQj are equivalent (i.e., Hig/ val(Aj) and Ylj^j \/a\{Bj) are isomorphic gen- 
eralized words). The following definition is an adaption of the definition of a proper 
expression in [2]. 

Definition 4.22 (proper). Let B = (Up, Lo, S, rhs) be a primitive 2-level system. A 
variable X G Lo U Up is proper if one of the following cases holds: 

(1) X e Lo 

(2) rhs{X) = Yi ■ ■ ■ Yn, where Yi ■ ■ - Yn does not merge and Yi, . . . ,Y„ are proper 

(3) rhs(X) — Y" or rhs(X) = Y"^, where Y is proper and YYY does not merge. 

(4) rhs(X) = [Yi, . . . , Yn]"^ where Fi, . . . , are proper and val(X) is not primitive. 

The 2-level system B is proper if B is irredundant, primitive, and all variables are 
proper. 

Note that the condition that YYY does not merge in Definition 4.22(3) impUes that 
YYY ■ ■ ■ and ■ • ■ YYY both do not merge by [2, Corollary 32]. Moreover, condition 
(4) from Definition 4.22 means that Fi , . . . , 1^ are proper and at least on val (Fj) is not 
a single symbol. 

Lemma 4.23 (see [2, Corollary 75]). Let Mbea proper 2-level system and X an upper 
level variable. Then uval(X) is the \o{M)-skeleton of\/a\{X). 

The next two lemmas will be used to make a given 2-level system proper. 

Lemma 4.24. Given a primitive 2-level system B and a finite semi-good sequence 

Ai - ■ ■ A.fn of variables of B, we can produce in polynomial time a primitive 2-level 
system C and a sequence Bi ■ ■ ■ i?„ of variables o/C such that the following holds: 

- The upper parts o/B and C are the same, and the lower part ofC extends the lower 
part o/B by at most m—1 many new lower level variables, whose right-hand sides 
have length 2. 

- The sequence Bi ■ ■ ■ Bn is good. 

- Ai - ■■ Am and Bi - ■ ■ Bn are equivalent sequences. 

- The subsequence of upper level variables in Ai - ■ ■ A^ is the same as the subse- 
quence of upper level variables in Bi ■ ■ ■ Bn. 

- n < m. 
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Proof. As long as the sequence Ai - ■ ■ Am contains a factor AiAi^i or AiAi+iAi+2, 
whose evaluation is a left-hand side of our rewriting system R, we do the following: 

If val(^i) is right-closed and val(Ai+i) is left-closed, then we introduce a new 
lower level variable A, set rhs{A) = AiAi+i, and replace the sequence Ai - ■ ■ Am 
by the sequence Ai ■ ■ ■ Ai-iAAi+2 ■ ■ ■ Am- If val(y4.i) = va\{Ai+i) = for some 
r C S,we continue with the sequence Ai - ■ ■ Ai^iA^^i ■ ■ ■ Am- Finally, if val(^j) = 
val(Ai+2) = for some F C E and val(^i_|_i) = a e -T, we continue with the 
sequence Ai - - ■ Aj_i^j_|_2 • • ■ ^m- We iterate this process as long as possible. □ 

Lemma 4.25. Given a primitive 2-level system B and a finite irreducible sequence 

Ai ■ ■ ■ Ak (k > 3), where every Aj is a lower level variable o/B, we can produce in 
polynomial time a primitive 2-level SES C and sequences Bi ■ ■ ■ Bm, Ci • • • C„ fm > 0, 
n>l) of lower level variables ofC such that the following holds: 

- The upper parts ofM and C are the same, and the lower part ofC extends the lower 
part o/B by at most one new lower level variable, whose right-hand side has length 
2. 

- The infinite sequence Bi ■ ■ ■ Bm{Ci . . . Cn)"^ is irreducible. 

- [Ai ■ ■ ■ AkY ^nd Bi - ■ - Bm{C\ ■ ■ ■ CnY equivalent sequences. 

- m,n <k. 

Proof. W.l.o.g. assume that (^i • • • AkY is not irreducible. Since Ax - ■ - Ak is irre- 
ducible, an i?-reduction in the infinite sequence Ai - - ■ A^Ai ■ ■ ■ A^Ai - ■ - Ak - ■ ■ can 
only occur at a border between Ak and Ai. There are the following cases, according to 
the left-hand sides of the system R. 

Case L va\{Ak) = val(Ai) = F^ for some F C E. Then, the infinite sequence 
AiA2 - • ■ Ak{A2 • • ■ Afc)'^ is irreducible and equivalent to our original sequence (recall 
that k > 3). 

Case 2. val(^fc) is scattered and right-closed, val(^i) is scattered and left-closed. Then, 
we introduce a new lower level variable A with rhs(j4) = AkAi. It follows that the 
infinite sequence A1A2 ■ ■ ■ Ak-i{AA2 ■ ■ ■ Ak-i)" is irreducible and equivalent to our 
original sequence. 

Cases, val(^fe) = r'',val(Ai) = a,val(A2) = T^'forsomeT C i:anda G F.lfk = 
3, then A1A2 ■ ■ ■ Ak = ^i^2^3 would not be irreducible (since val(A2) = val(A3) = 
F'^), which contradicts our assumptions. Hence, assume that k > 4. Then, the sequence 
A1A2 ■ ■ ■ Ak{A^ ■ ■ ■ Ak)'^' is again irreducible and equivalent to our original sequence. 

Case 4. val(Afe_i) = T'', val(Aft) = a, val(Ai) = F"^ for some F <Z E anda G F. 
This case is similar to Case 3. □ 

Let B be an SES and X a variable with ojr]-de])th{X) = h > 1. Then there is a se- 
quence of variables Xi, . . . , Xh such that Xh = X, Xi <k Xi+i, and a;?7-dcpth(Xi) = 
i. Note that val(Xi) is either primitive or a shuffle of finite words. If val(Xi) = 
[wi, . . . , Uk]^ where at least one of the Ui is in E-'^ (thus, val(Xi) is not primitive), 
then this sequence is called a bad sequence. If a variable X has a bad sequence, then 
we say it is of bad shape. Otherwise it is of good shape. For instance, if rhs(X) = \Y]^ 
and rhs(F) = ah, then X is of bad shape. 
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Proposition 4.26. Let B = {V, S, rhs) be an SES such that for every variable X &V, 

either rhs(X) G E+\Ji:*V S*\JVV or rhs(X) is of the form Y", Y^, or [Fi, . . . , YnY' 
for Y,Yi, . . . ,Yn G V yj E. Given B we can produce in polynomial time a proper 2- 
level system C = (Up, Lo, -T, rhs) such that every variable X & V, where m3\k{X) is 
not primitive, belongs ?o Up and for each of these variables X we have: 

(a) valB(X) = valc(X) 

(b) If X is of good shape in B, then u)ri-AcY>t]i^{X) > ujri-dcY>i\i^^(^^-^{X). 

(c) If X is of bad shape in B, then ojr]-depth.^{X) = wr^-depth^p^c)!^) ^nd X is of 
good shape in up(C). 

Proof. W.l.o.g. we can assume that val(B) is not primitive. We start with some prepro- 
cessing. 

Preprocessing. First we transform our succinct expression B into a 2-level system C 
by collecting in Lo all variables X such that val(X) is primitive. This can be done in 
polynomial time using Lemma 4.16. Note that if va I (X) is primitive and scattered, then 
for every Y in rhs(X), val (y ) is primitive too. But if val (X) is primitive and dense (i.e., 
of the form for some F C S), then this is not necessarily true.^ Hence, in this case 
we have to redefine rhs(F) = F^. After this process the 2-level system C is already 
primitive, satisfies conditions (a), (b), and (c) in our proposition, and for all X G Up 
the word val(X) is not primitive. All these properties will stay invariant throughout the 
remaining proof where we manipulate the system C in order to make it proper. 

Before we come to the actual algorithm we transform C for technical convenience 
such that for all X G Up one of the following holds: 

(1) rhs(X) G Lo^2uLo*UpLo*, 

(2) rhs(X) = [Yi, . . . Fn]" for some F, . . . , y„ G Up U Lo, 

(3) rhs(X) G UpUp, 

(4) rhs(X) = F" for r G Up U Lo, 

(5) rhs(X) = F~for F G Up U Lo. 

In order to achieve this form we simply introduce for each upper level variable X with 
rhs(X) = uYv where u,v G S* and Y G V two variables G Lo and set 

rhs{X) ~ XuYXj., rhs(Xi,) = u, and rhsiX^.) = v (if e.g. u = e, then Xu is not 
present). Moreover, if a symbol a £ S occurs in a right-hand side of the form F'^, Y'^, 
or [Fi , . . . , Yn]'^, then we replace that occurrence by a new Lo- variable with right-hand 
side a. 

In fact, by this preprocessing all right-hand sides of the form (I) have length at most 
3. This fact will be important when we estimate the size of the final system. From now 
on variables in Up that have a right-hand side of form (1) or (2) are said to be of type 
(1, 2), all other variables are said to be of type (3-5). 

Following [2, proof of Theorem 65 & 66] we will now give an algorithm that pro- 
duces a proper 2-level system. We will proceed along the hierarchical order of the vari- 
ables in Up where in each step we possibly add a constant number of new variables and 

^ Let, for instance, rhs(X) = [F]'' with val(F) = a[a]''. Then val(X) = [o]'' is primitive but 
val(F) is not primitive. 
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change the right-hand sides of the old variables such that all variables are proper and 
of the form (l)-(5) and, moreover, all old variables X are of type (1, 2) and fulfill the 
following technical condition (TEC): 



(a) If val(X) has a first block, then rhs{X) e Lo-^ U Lo+UpLo* and the first 
variable of rhs(X) evaluates to the first block of val(X). 

(b) If val(X) has a second block and the first block is scattered, then rhs(X) G 
Lo-^ U Lq-^ U pLo* and the second variable of rhs{X) evaluates to the second 
block ofval(X). 

(c) If val(X) has a last block then rhs{X) G Lo-^ U Lo*UpLo+ and the last 
variable of rhs{X) evaluates to the last block of val(X). 

(d) If val(X) has a second last block and the last block is scattered, then rhs(X) e 
Lo-^ U Lo*UpLo-^ and the second last variable of rhs{X) evaluates to the 
second last block of val(X). 



We need the following claim about this property (TEC): 

Claim. If rhs(X) e Lo+ U Lo*UpLo* and rhs(X) is good, then X satisfies (TEC). 

Proof. By symmetry let us only consider conditions (a) and (b) of (TEC). Assume 
that rhs(X) is a good sequence. If rhs{X) G Lo*, then Lemma 4.10 implies that the 
variables in rhs(X) evaluate to the blocks of val(X) (recall that rhs(X) is good). Hence 
(a) and (b) hold. Next, assume that rhs{X) E Lo-^UpLo*. Again, since rhs(X) is good. 
Lemma 4. 10 implies that the first two variables in rhs(X) evaluate to the first two blocks 
of val(X). Thus, (a) and (b) hold again. If rhs(X) e UpLo*, then the first variable of 
rhs(X) evaluates to a non-primitive word. Since rhs{X) is good, it follows that val(X) 
does not have a first block and (a) and (b) hold. Finally assume that rhs(X) G LoU pLo* 
and the first two variables of rhs{X) are A e Lo and Z e Up. Then, val(A) is the 
first block of val(X). Since rhs{X) is good either va\(Z) does not have a first block 
or val(Z) has a first block, val(A) is uniform, and (val(A), val(Z)) does not merge. In 
both cases (a) and (b) are obviously satisfied. This proves the claim. 

Actual algorithm. We can now outline our procedure. Consider a variable X G Up 
such that every variables in rhs{X) is either in Lo or was already processed and is 
therefore now proper, satisfies (TEC), and is of type (1, 2). We need to distinguish on 
the form of the right-hand side of X. In aU of the following cases, we reset rhs{X) 
either 

(i) to a shuffle of variables that are already proper or 

(ii) to a good sequence from Lo"'' U Lo*UpLo* (and all variables in that sequence are 
already proper). 

In (i), X is proper by Definition 4.22(4) (note that val(X) is not primitive since X G 
Up). In (ii) it follows from Lemma 4.10 and Claim 4.6, that X is proper and satisfies 
(TEC). For every other new upper level variables Y that is introduced, the right-hand 
side is either 
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(i) a non-merging sequence of (already proper) variables or 

(ii) Z'^ or Z", where Z is already proper and ZZZ does not merge. 

In both cases it follows from Definition 4.22 that Y is proper too. 

Case 1. rhs{X) G Lo^ U Lo"^ (hence rhs(X) is semi-good). By applying Lemma 4.24 
to rhs(X), we can compute in polynomial time an equivalent good sequence of at most 
three possibly new Lo- variables (and their corresponding right-hand sides). This se- 
quence becomes the new right-hand side of X. 

Case 2. r\\s{X) £ Lo-^UpLo-^. Let Z be the unique Up-variable in rhs(X). Note that 
Z is one of the old variables, which has already been processed and hence is proper, 
of type (1, 2), and satisfies (TEC). If rhs(Z) e Lo-^ U Lo*UpLo*, then we replace Z 
in rhs(X) by rhs(Z) (if rhs(Z) is a shuffle, then we leave Z in rhs(X)). Recafl that Z 
is proper and satisfies (TEC). It follows easily that the resulting new right-hand side of 
X is semi-good and in Lo-^ U Lo* U pLo* . Thus, we can apply Lemma 4.24 and obtain 
an equivalent good sequence in Lo^ U Lo*UpLo* (as in Case 1, we will introduce new 
Lo-variables thereby). This good sequence will be the new right-hand side of X. 

Case 3. rhs{X) = [Yi, . . . , Yk]^. Then there is nothing to do. Recall that we assumed 
that val(X) is not primitive and hence X is proper and satisfies the technical condition 
(TEC) as va\{X) neither has a first nor a last block. 

Case 4. rhs{X) = YZ for some Y,Zg Up. Here Y and Z are old variables, which 
have already been processed and therefore are proper, of type (1, 2), and satisfy (TEC). 
If rhs(y) € U Lo*UpLo* flien we replace Y in YZ by rhs(r) (if rhs(y) is a 
shuffle, we leave Y in YZ). We proceed analogously with Z in YZ. Since Y and Z 
are proper and satisfy (TEC), it follows (as in Case 2) that the resulting new right-hand 
side of X is semi-good and contains at most two variables from Up. Thus we can apply 
Lemma 4.24 and obtain an equivalent good sequence u of variables with at most two 
variables from Up (again, we introduce new Lo-variables thereby). 

Now, we replace parts in the sequence u in order to get rhs(X). First, assume that 
u = Ai - ■ ■ Ak G Lo^. If fc < 5, then rhs(X) simply becomes u (which is good). If 
A: > 6, then we introduce a new Up- variable U and set 

rhs(X) = AiA2UAk-iAk, rhsiU) = A3 • ■ • Ak-2- 

Since u is good, both right-hand sides are good as well. Second, assume that u = 
Ai--- AkUBi ■■■Be, e Lo*UpLo* with [/ e Up. If fc < 2 and ^ < 2 then we we 
simply set rhs(X) = u. On the other hand, if fc > 2 or £ > 2, then we introduce a new 
U p-variable V and set 

rhs(X) = A1A2VB1-1B1, rhs{V) = A3 • • ■ AkUBi ■ ■ ■ B1-2 

(if e.g. k > 2 but £ = 1, then i?i • • • Be-2 and -B^_i disappear). Since u is good, 
rhs(X) will be good too. Moreover, since u does not merge (by Lemma 4.10), rhs(F) 
does not merge as well (rhs(F) is not necessarily good). Third, assume that u = 
Ai • • • AkUBi ■ ■ ■ BeVCi ■■■Cn G Lo*UpLo*UpLo* with U,V & Up. In fliis case 
we introduce two new Up- variables Wi and W2 and set 

rhs(X) = A1A2W1C1 ■■■Cn, rhs(Wi) = W2V, rhs(W2) = A3 • ■ • AkUBi ...Be. 
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Again, since u is good, rhs{X) is good as well. Moreover, since u does not merge, 
neither rhs(Wi) nor rhs(M/2) merges. Note that the number n in the right-hand side of 
X above is bounded by |rhs(Z)|. This will be important for estimating the length of 
right-hands. 

Case 5. rhs{X) = . Note that Y is either a Lo-variable, or it is an old Up-variable, 
which has already been processed and hence is proper, of type (1, 2), and satisfies 
(TEC). We can therefore distinguish the following subcases. 

Case 5(a). rhs(F) = [Zi, . . . , for some Zi, . . . , Z„ £ Lo U Up. Then by the gen- 
eral identity (I^'')" = F"^ (which follows from Cantor's theorem), we have val(X) = 
val(y) and we set rhs(X) = Y . Then X is obviously proper. Since we assumed that 
va I (X) is not primitive va I (X ) does not have a first or a last block and (TEC) is satisfied. 

Case 5(b). rhs(y) G Lo*UpLo*. Let rhs(y) = uZv with Z G Up and u,v € Lo*. 
Since Y is proper and satisfies (TEC), the infinite sequence uZvuZv ■ ■ ■ = u{Zvu)^ is 
semi-good. By applying Lemma 4.24 to the sequence vu of Lo-variables, we obtain an 
equivalent good sequence u{Zw)'^ . Here uj is a sequence of (possibly new) Lo-variables 
such that w represents the irreducible normal form w.r.t. R of the sequence represented 
by vu. Note that \w\ < |wv|. We set 



Since the sequence u{ZwY is good, also the sequence uV is good. Moreover, since 
u{ZwY does not merge (by Lenama 4.10), the same holds for rhs(C/) and UUU (so U 
and V are proper by definition). 

Case 5(c). y € Lo and hence val(y) is primitive. Then the infinite sequence YYY ■ ■ ■ 
must be irreducible, because otherwise val(F) would be either finite or uniform and 
val(X) = val(y") would be primitive. We introduce a new Up- variable Z and set 



Then rhs(X) is good and YYY does not merge. 

Case 5(d). rhs(y) e Lo^. Let rhs(y) = A1A2 for Ai,^2 e Lo. Since Y is already 
proper, we know that A1A2 is irreducible. If the infinite sequence A1A2A1A2 • is 
irreducible too, then we introduce a new Up- variables Z and set 



Clearly, rhs(X) is good and FFF doesnotmerge.Ontheotherhand,if • ■ • 

is not irreducible, then (since A1A2 is irreducible), an i?-reduction can only occur at a 
border between A2 and Ai. The case that va\{Ai) = va 1(^42) = for some F C E 
cannot occur (since A1A2 is irreducible). If val(A2) is scattered and right-closed and 
val(^i) is scattered and left-closed, then we introduce a new Lo-variable B and a new 
Up- variable Z and set 



rhs(X) 



uV, rhs{V) = W^, rhs{U) = Zw. 



rhs(X) = YYZ, rhs(Z) = Y'^. 



rhs(X) = A1A2Z, rhs(Z) Y' 



rhs(X) = AiBZ, rhs(Z) = B' 



rhs(S) =^2^1- 
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It is straightforward to show that the infinite sequence A\BBB • • • is irreducible. 
Hence rhs(X) is good and EBB does not merge. Next, if val(Ai) = and val(A2) = 
a for some PCS and a € F, then A1A2A1A2 ■ ■ ■ evaluates to F"^. Hence, val(X) 
is primitive, which is a contradiction. Finally, if val(A2) = F^ and val(Ai) = a e F, 
then A1A2A1A2 ■ ■ ■ evaluates to aF'^ = val(y) and we set rhs(X) = Y. 

Case 5(d). val(F) e Lo-^. We apply Lemma 4.25 to the irreducible sequence rhs(F) 
and compute sequences u, v of (possibly new) Lo-variables with their corresponding 
right-hand sides. The infinite sequence uv'^ of Lo-variables is irreducible and evaluates 
to val(y). W.l.o.g. we can assume \u\ > 2 (otherwise, we can replace u by uvv). We 
introduce new Up- variables U and V and set 

rhs(X) = uV, rhs(y) = rhs(C/) = v. 

(if \v\ = 1, i.e., V consists of a single Lo-variable, then we do not need U). 
Case 6. rhs(X) = F". This case is symmetric to Case 4. 

The resulting system C is primitive and aU Up- variables are proper. On the other hand, 
C is not necessarily irredundant. But this can be easily achieved as described in Re- 
mark 4.21. □ 

We are now in the position to prove Theorem 4.15. 

Proof of Theorem 4.15. It suffices to show that the following problem can be solved in 
polynomial time: 

INPUT: An SES A and two variables X, Y of A. 
QUESTION: val(X) ^ val(y)? 

If both variables X and Y evaluate to primitive words, then we just need to apply 
Lemma 4.18. If only one of the two evaluates to a primitive word, then val(X) ^ 
val(F). Hence, we may assume that both val(X) and val(F) are not primitive. In par- 
ticular, we have a;r7-depth(X), a;T7-depth(y) > 0. It is easy to bring A into the normal 
form required in Proposition 4.26. Applying Proposition 4. 26 to A gives a proper 2-level 
system Aq. The variables X and Y belong to the upper level part of Ao. Starting with 
Ao we construct a sequence of proper 2-level systems Aj = (Up^ , Loj, Loj_i, rhsj) 
(with Lo_i = 17). In order to obtain A^ we apply the procedure of Proposition 4.26 
to up(Aj_i). Let k be maximal such that X and Y belong to the upper level part of 
Afc. Since by Proposition 4.26 in every second step the wry-depth of X and Y strictly 
decreases we have fc < 2 • |A|. 

Let < J < fc. By Lemma 4.23 uvalj(X) is the lo(Aj)-skeleton of valj(X) and 
similarly for Y. Hence ya\j{X) = valj(y) if and only if uvalj(X) = uvalj(y) by 
Proposition 4.3. Recall that Aj+i is obtained by applying the procedure of Propo- 
sition 4.26 to up(Aj). We obtain valj(X) ^ valj(F) if and only if valj+i(X) ^ 
valj+i(F) for all < j < fc. Hence, val(X) S val(y) if and only if valfe(X) ^ 
valfe(y) if and only if uvalfe(X) = uvalfe(y). Now, by the maximality of fc, uvalfe(X) 
or uvalfe(y) must be primitive. Hence, using Lemma 4.18, we can check in polynomial 
time whether uvalfe(X) = uvalfc(F). 
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Runtime. Let us analyze the system up(Aj) for 1 < j < fc. The 2-level system 
is obtained by applying Proposition 4.26 to up(Aj_i). Observe that by the construc- 
tion in the proof, the system up(Aj) already has the normal form that we require in 
Proposition 4.26. Let Type(3-5)j be the set of variables in Up^ that are of type (3-5). 

Now let us estimate the number |Upj| for 1 < j < A:. Observe that in the proof 
of Proposition 4.26 in each of the Cases (l)-(3) only new lower level variables are 
introduced. In each of the Cases (4)-(6) the old variable is turned into a variable of 
type (1, 2) and at most one new variable of type (3-5) is added to Up^. Moreover, 
additionally at most one new variables of type (1, 2) is added to Up^ . We conclude that 
|Type(3-5)j | < |Type(3-5)j_i | and the total number of variables in Up^- is bounded by 
|Upj_i| + 2 • |Type(3-5)j_i|. Recall that j < fc < 2|A|. Hence |Up^| < |Upo| -|- 2j ■ 
|Type(3-5)o| < |Ao| • (4 • |A| + 1) for all < j < fc. 

Let us now estimate the maximal length of a right-hand side in Aj. Let us first 
bound the length of the right-hand side of a variable X e Up^ n ^Pj^i (i e-, an old 
variable). By reanalyzing all cases from the proof of Proposition 4.26, we see that for 
such a variable X, \ rhsj{X) \ is either at most 5 or it is bounded by 3 + | rhsj (V) \ , where 
is an old variable, which was processed before. We therefore obtain 
\rhSj{X)\ < 3 • |Upj- n Upj_i| -t- 5. Hence, \rhSj{X)\ < 3 • |Ao| • (4 • |A| -|- 1) -|-5. For 
the newly added variables, X e U p^ \ U _ ^ the size of the right-hand side is bounded 
by twice the maximal size of a right-hand side of an old variable in Up^ n ^Pj-i (the 
factor 2 comes from Case 4). Hence |rhSj(X)| < 6 • |Ao| • (4 • |A| + 1) + 10 for all 
X e Upj. Finally, note that |Ao| is bounded polynomially bounded in |A|. 

Concerning lower level variables of Aj, note that the length |rhsj(A)| for a lower 
level variable of A^ is bounded by 2 (if A is introduced in one of the Cases 1-6) or by 
the maximal length of the right-hand side of a variable from Aj _ i (if ^ is introduced in 
the preprocessing step). Moreover, in each of the Cases 1-6, the number of new lower 
level variables that are introduced is bounded by twice the maximal size of a right-hand 
side of an old variable in Up^ r\[]pj_i (the factor 2 comes again from Case 4). Hence 
the number of lower level variables is also bounded polynomially in | A| . 

We have shown that the total size of very 2-level system Aj (1 < j < k) is bounded 
polynomially in |A|. As the time needed to construct Aj+i from Aj is polynomially 
bounded by Proposition 4.26, we conclude that the overall running time of our algo- 
rithm is polynomially bounded as well. □ 

4.7 Lower bounds for regular linear orders 

In this section we prove lower bounds for the isomorphism problem for regular words. 
In fact, all these lower bounds only need a unary alphabet, i.e., they hold for regular 
Unear orders. The results in this section nicely contrast the results from Section 3, where 
we studied the isomorphism problem for the prefix order trees on regular languages. In 
this section, we replace the prefix order by the lexicographical order. 

Theorem 4.27. The following problem is P-hard (and hence P -complete) for every fi- 
nite alphabet S: 

INPUT: Two succinct expressions Ai and A2 over the alphabet E. 
QUESTION: val(Ai) ^ \/a\{A2)? 
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Proof. Note that the problem can be solved in polynomial time by Theorem 4.15. P- 
hardness will be shown by a reduction from the monotone circuit value problem. So, 
let C be a monotone Boolean circuit. We can assume that the gates of C are partitioned 
into layers Li, . . . , L„, where Li contains all input gates, L„ only contains the output 
gate, and all inputs for a gate from Lj+i belong to Li. Moreover, ij (i > 1) either 
contains only and-gates or or-gates. We construct an SES A (over a unary terminal 
alphabet {a}), which contains for each gate t; of C a variable test^ and for each layer 
d e {1, . . . , n} two variables good^;, and bad^ such that the following holds for all 
gates V £ Ld'. 

(a) Either va I A (test,,) = valA(badd) or valA(testt,) = valA(goodj^). 

(b) valA(test,,) = valA(good^) if and only if gate v evaluates to true. 

(c) The hnear orders vaUCgood^;) and valA(badd) do not contain an interval isomor- 
phic to w • (recall that oj ■ d denotes the hnear order uj + ■ ■ ■ + lo). 



The base case for the first layer is trivial. Set rhsA(goodi) = a and rhsA(badi) = aa. 
In other words, valA(goodi) = 1 and valA(badi) = 2. Moreover, rhsA(test„) = o if 
w G Li is a true-gate and rhsA(test,,) = aa if u G Li is a false-gate. 

Now assume that v e I/d+i is a gate with inputs v\,V2 G L^. For n G N we use the 
abbreviation 



Moreover, we write a + /3 for the concatenation a/3 of the regular expression a and (3 
(which denote regular linear orders since the alphabet is unary). There are two cases: 

Case 1. consists of and-gates. Then we set 

rhsA(test,;) = [uj ■ d + testvi , w • d + test„2 ,uj ■ d-\- good^]'' 
rhsA(goodrf_^i) = [w • d + goodj'' 
rhsA(badd+i) = [w • -h good^jO; ■d+baddY'. 

Case 2. Ld+i consists of or-gates. 



The above three properties (a), (b), and (c) can be shown by induction on the layer. 
For layer Lx all three properties are trivially true. Now, consider layer L^+i. Property 
(a) follows directly from the induction hypothesis for layer L^. Since the linear orders 
valA(good^) and valA(badd) are shuffles, (c) holds for layer L^+i too. Finally, for (b) 
we consider two cases: 

Case 1. V G Ld+i is an and-gate. Let vi,V2 G L^, be the inputs for v. First, assume that 
V evaluates to true. Then, v\ and V2 both evaluate to true. Hence, by induction, we get 




d times 



CO ■n = a a ■ ■ - a 
^ „ 



n times 



rhsA(test^) 

rhsA(goodrf_^i) 
rhsA(badd+i) 



[u ■ d + test„i ,u) ■ d + test„2 ,io ■ d + badd] 
[uj ■ d + good^, LJ ■ d + badd]'' 
[wd+badd]". 
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valA(test„J = valA(test^J ^ vaUCgood^). Thus, 

valA(test^) = [uj ■ d + valA(test„J,a; • d + valA(test„J, w • d + valA(goodrf)]'' 
^ [oj ■ d + \/a\A{good^)Y' 
= valA(goodrf+i). 

For the other direction assume that 

valA(test„) = [u! ■ d + valA(test^, J, w • d + vaUCtest^J, w ■ d + valA(goodrf)]'' 
^ [wd + valA(goodJ]''. 

Since neither valA(test„J nor valA(testt,2) nor valA(good^) contains an interval iso- 
morphic to w • d, [18, Lemma 23] implies that 

u) ■ d + valA(testvJ = w • valA(test„2) =lo ■ d + valA(good^). 

This implies 

valA(tesV) = valA(test„J = valA(goodrf). 

Finally, the induction hypothesis yields that both vi and V2, and hence also v evaluate 
to true. 

Case 2. V € L^+i is an or-gate. We can use similar arguments as for Case 1. □ 

We do not know, whether the lower bound from Theorem 4.27 holds for ordinary ex- 
pressions too (instead of succinct expressions). 

Theorem 4.28. The following problem is P-hard (and hence P-complete): 

INPUT: Two DFAs Ai and A2. 
QUESTION: (L(^i); <iex) = (^(^2); <iex).^ 

Proof. Note that by Theorem 4.1 the problem belongs to P. For P-hardness, it suffices 
by Theorem 4.27 to construct in logspace from a given succinct expression A (over a 
unary terminal alphabet) a DFA A such that the linear order val(A) is isomorphic to 
(L (^) ; < lex) . But this is accomplished by the construction in the proof of [29, Proposi- 
tion 2]. □ 

Theorem 4.1 implies that it can be checked in EXPTIME whether the lexicographical 
orderings on two regular languages, given by NFAs, are isomorphic. We do not know 
whether this upper bound is sharp. Currently, we can only prove a lower bound of 
PSPACE: 

Theorem 4.29. The following problem is PSPkCE-hard: 

INPUT: Two NFAs Ai and A2. 

QUESTION: {L{Ai); <iex) ^ (^(^2); <iex).? 
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Proof. We prove PSPACE-hardness by a reduction from the PSPACE-complete prob- 
lem whether a given NFA A (over the terminal alphabet {a, b}) accepts {a, b}* [28]. 
So let A be an NFA over the terminal alphabet {a, 6} and let K = L{A). Let S = 
{0, 1, a, 6, $1, $2} and fix the following order on E: 

$1 < < 1 < $2 < a < 6. 

Under this order, ({0, 1}*1; <iex) = ({a, <iex) = "q. 

It is straightforward to construct from A in logspace NFAs for the following lan- 
guages: 

Li = {a,b}*b%i 
L2 = Kb{Q,iyi 
L3 = {a,6}*6$2 

L = Li U L2 U L3 (6) 

It follows that 

(i; <lex) = X] 

tue{o,6}*6 

(the sum is taken over aU words from {a, 6}*6 in lexicographic order), where 



l-|-?7-|-l ifwe/f 
2 else. 



Hence, \iK ^ {a, 6}*, then (L; <iex) contains an interval isomorphic to 2 and therefore 
is not dense. Hence (L; <iex) ^ ??. On the other hand, if K = {a, b}*, then (L; <iex) = 
{1 + T] + 1) ■ r] = T]. This proves the theorem. □ 

Remark 4.30. The proof of Theorem 4.29 shows that it is PSPACE-hard to check for a 
given NFA A, whether {L{A); <iex) — In fact, this problem is PSPACE-complete, 
since we can check in polynomial space whether {L[A); <iex) — f): In polynomial time, 
we can construct an NFA B that accepts a convolution of two words^ u®v\i and only 
ifu,v G L{A) and there exist words wi, W2, ^^3 £ L{A) such that wi <iex u <iex W2 
and (t; <iex w or u <iex <iex t^). Then, {L{A); <iex) — ?? if and only if B accepts the 
set of aU convolutions u®v with u,v G L{A). The latter can be checked in polynomial 
space. 

Remark 4.31. In [9] it is shown that the problem, whether for a given context-free lan- 
guage L the linear order (L; <iex) is isomorphic to 77, is undecidable. This result is 
shown by a reduction from Post's correspondence problem. Note that this result can be 
also easily deduced using the technique from the above proof: If we start with a push- 
down automaton for A instead of an NFA, then the language L from (6) is context-free. 
Hence, (L; <iex) — r/ if and only if L( .A) = {a, &}*. Thelatterpropertyis aweU-known 
undecidable problem. 



* The convolution of the words 0102 •• • Om and 6162 ■■■ fen is the word 
(oi, 6i)(a2, 62) • • • (a/c, fefc), where k = max{m, n}, ttj = # (a dummy symbol) for 
m < i < k and hi = # for n < i < k. 
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In Section 3 we also studied the isomorphism problem for finite trees that are suc- 
cinctly given by the prefix order on the finite language accepted by a DFA (resp., NFA). 
To complete the picture, we will finally consider the isomorphism problem for hnear 
orders that consist of a lexicographically ordered finite language, where the latter is rep- 
resented by a DFA (resp., NFA). Of course, this problem is somehow trivial, since two 
finite hnear orders are isomorphic if and only if they have the same cardinality. Hence, 
we have to consider the problem whether two given acycUc DFAs (resp. NFAs) accept 
languages of the same cardinahty. 

Proposition 4.32. It is C= L-complete ( resp. C= P -complete ) to check whether two given 
acyclic DFAs ( resp., acyclic NFAs) accept languages of the same size. 

Proof. The upper bounds are easy: There exists a nondeterministic polynomial time 
(resp., logspace) machine, which gets an NFA (resp. a DFA) A over an alphabet S as 
input, and has precisely |i(-4)| many accepting paths. Let n be the number of states 
of n. The machine first branches nondeterministically for at most n ■ logdi^l) steps 
and thereby produces a word w G Then it checks whether w G L{A) and only 

accepts it this holds. The checking step can be done in deterministic polynomial time 
for an NFA and in deterministic logspace for a DFA. 

For the lower bound, we first consider the DFA-case. Given two nondeterministic 
logspace machines Mi , M2 (over the same input alphabet) together with an input w we 
can produce in logspace the configuration graphs Gi and G2 of Mi and M2, respec- 
tively, on input w. W.l.o.g. we can assume that Gi and G2 are acyclic (one can add a 
step counter to Mi). Now, from Gi it is straightforward to construct an acyclic DFA 
Ai such that is the number of paths in Gi from the initial configuration to the 

(w.l.o.g. unique) accepting configuration. The latter number is the number of accepting 
computations of Mj on input w. 

Finally, C=P-hardness for NFAs follows from [16, Theorem 2.1], where it was 
shown that counting the number of words accepted by an NFA is #P-complete. □ 

4.8 Ordered trees 

Let us briefly discuss the isomorphism problem for ordered regular trees, i.e., regular 
trees, where the children of a node are linearly ordered. An ordered tree can be viewed 
as a triple {A; <, R), where {A; <) is a tree as defined in Section 2.3 and the binary 
relation R is the disjoint union of relations Ra (a G A), where Ra is a linear order 
on the children of a. Now, assume that .A is a (deterministic or nondeterministic) finite 
automaton with input alphabet S and let <£■ be a linear order on IJ. Assume that 
e G L{A). Then, we can define a finitely branching ordered regular tree oT(^, <s) 
with A as follows: 

0T{A,<s) = {L{Ay, <p,ef, \JueL{A)^u), 

where Ru is the relation 

Ru = {{t}, w) IVjW are children of u in {L{A); <pref ), w <iex w}. 



44 



This means that we order the children of a node u € L{A) lexicographically. In the 
following, we will omit the order <2; on the alphabet. The proof of the following result 
combines ideas from the proof of Theorem 3 . 1 with Theorem 4.1. 

Proposition 4.33. The following problem is P -complete: 

INPUT: Two DFAs Ai and A2 with e e L{Ai) n L{A2). 
QUESTION: oJ{Ai) ^ oT(A).? 

Proof. Similarly to the proof of Theorem 3.1, it suffices to take a DFA ^ = {Q, E, 5, F) 
without initial state and two states p,q £ F, and to check in polynomial time, whether 
oT{A,p) ^ oT{A,q), where oT(^,r) = oT{Q,E,S,r,F) for r e F. Define the 
following equivalence relation on F: 

iso = {{p,q)GFxF\ oT{A,p) ^ oT(A q)}- 

We show that iso can be computed in polynomial time. As in the proof of Theorem 3.1, 
this will be done with a partition refinement algorithm. We need a few definitions. 

Recall from the proof of Theorem 3.1 the definition of the languages L{A,p, C) 
and K{A, p, C) C L{A, p, C) forp € F and C C F. Assume that R is an equivalence 
relation on F and let m be the number of equivalence classes of R. Fix an arbitrary 
bijection / between the the alphabet {1, . . . , m} and the set of equivalence classes of 
R. With R and p G Fwe associate a partitioned DFA A{p, R) as follows: Take the DFA 
for the language L{A, p, F) as defined in the proof of Theorem 3.1 and set Fi = f{i) 
(1 < i < m), which is the set of final states associated with symbol i. Finally, define 
the regular word w{p, R) = w{A{p, R)) over the alphabet {1, . . . , m}. We define the 
new equivalence relation i? on F as follows: 

R = {{p, q)GR\ w{p, R) ^ w{q, R)}. 

Thus, i? is a refinement of R which, by Theorem 4.1, can be computed in polynomial 
time from R. Let us define a sequence of equivalence relations Ro,Ri,. . . on F as 
follows: Rq = F X F, iij+i = Ri. Then, there exists fc < |F| such that Rk = Rk+i- 
We claim that Rk = iso. 

For the inclusion iso C R^, one shows, by induction on i, that iso C Ri for all 
1 < i < k. The joint is that for every equivalence relation R on F with iso C R, we 
also have iso C R. To see this, assume that iso C R but there is {p, q) € iso, which does 
not belong to R. Since (p, q) belongs to R, we must have R) ^ w{q, R). On the 
other hand, since (p, q) G iso, it follows that the regular words w{p, iso) and w{q, iso) 
are isomorphic. But since iso C R, w{p, R) is a homomorphic image of w{p, iso) 
and similarly for w{q, R). Thus, also w{p, R) and R) are isomorphic, which is a 
contradiction. 

For theinclusion Rk C iso, we show that if R is an equivalence relation on F such 
that R = R (this holds for R^), then R C iso. For this, take a pair {pi,P2) € R. Take 
the tree oT{A,Pi). We assign types in form of final states to the nodes of 07(^,7;,) 
in the same way as in the proof of Theorem 3.1. We now construct an isomorphism 
/ : oT(^,pi) — > oT{A,p2) as the limit of isomorphisms /„, n > 1. Here, /„ is an 
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isomorphism between the trees that result from oT{A,pi) and oT{A,p2) by cutting off 
all nodes below level n. Let us call these trees oT{A,Pi) \n (i € {1, 2}). Moreover, if 
an /„ maps a node m of type qi to a node U2 of type q2, then we will have {qi ,q2 ) E R. 
Assume that /„ is already constructed and let Ui of type qi be a leaf of oT{A,pi) f„. 
Let U2 — f{u\) be of type 52- Then we have [qi. 52) € R and hence the regular 
words w{qi,R) and w{q2, R) are isomorphic. Let <? be an isomorphism. The elements 
of these regular words correspond to the children of ui and U2, respectively. More 
precisely, if Vi belongs to the domain of R), then UiVi is a child of Uj and vice 
versa. Clearly, g can be also viewed as an isomorphism between the lexicographical 
orderings on the children of ui and U2, respectively. Moreover, by definition of the 
regular words w{qi , R) and w{q2,R), if g maps some uivi of type ri to ^2^2 of type 
r2, then (ri,r2) S R. By choosing such an isomorphism g for every pair /(ui)) 
of leaves in oT(^,pi) |"„ and oT(^,p2) \n, respectively, we can extend /„ to fn+i- □ 

Let us now consider prefix-closed automata. Here, we can improve the upper bound 
from Theorem 4.33 to NL. 

Proposition 4.34. The following problem is HL-complete: 

INPUT: Two prefix-closed DFAs A\ and A2. 
QUESTION: 6T{Ai) ^ oT(^2)? 

Proof. Again, it suffices to take a prefix-closed DFA A = {Q, S, 5, Q) without ini- 
tial state, and two states p,q <E Q, and two check in NL, whether oT(Q, E, 5,p, Q) = 
oT((5, 5,p, Q). By the complement closure of NL, it suffices to check nondetermin- 
istically in logarithmic space, whether oT(Q, S, 5,p, Q) ^ oT{Q, S, S,p, Q) This can 
be done as follows: Let 04 < 0,2 •■ • < am and 61 < 62 < ■ • ■ < the transi- 
tion labels of the outgoing transitions of p and q, respectively. If to 7^ n then clearly 
oT(Q, S, S,p, Q) ^ oT(Q, S, 5, q, Q) and the algorithm can accept. If n = to, then 
oT(Q, 17, Q) ^ oT((5, (5, Q) if and only if there exists 1 < i < m such that 
oT(Q, 5, S{p, ai),Q) ^ oT(Q, S, 6, S{q, hi),Q). Hence, the algorithm will simply 
guess 1 < i < TO and replace the state pair [p, q) by {5{p, ai), S{q, bi)). In this way, the 
algorithm only has to store two states of A, which is possible in logspace. 

NL-hardness can be shown by a reduction from the complement of the graph acces- 
sibiUty problem. Take a directed graph G = {V, E) and two nodes s,t e V. Add to 
each node of V loops, so that every node v eV\ {t} has outdegree n (where n can be 
taken as the maximal outdegree of a node of G) and t has outdegree n + 1. Then label 
the edges of the resulting multigraph arbitrarily by symbols so that we obtain a DFA A 
(the initial state is s and all states are final). Then there is no path from s to Hn G if 
and only if the tree oT(^) is a full n-ary tree. □ 

Corollary 4.35. The following problem is PSPACE-complete: 

INPUT: Two prefix-closed NFAs Ai and A2. 
QUESTION: oT(A) = oT(.42).^ 

Proof. The PSPACE upper bound follows from Proposition 4.34, using Lemma 2.1 
and the obvious fact that the power set automaton of a given NFA can be produced by 
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a PSPACE-transducer. For the PSPACE lower bound, note that for an NFA A over 
an alphabet U we have L{A) — S* if and only if oT(^) is a full |i7|-ary tree. But 
universality for NFAs is PSPACE-complete [28]. □ 

5 Conclusion and open problems 

Table 1 (Table 2) summarizes our complexity results for the isomorphism problem for 
regular trees (regular linear orders). Let us conclude with some open problems. As can 
be seen from Table 2, there is a complexity gap for the isomorphism problem for regu- 
lar linear orders that are represented by NFAs. This problem belongs to EXPTIME and 
is PSPACE-hard. Another interesting problem concems the equivalence problem for 
straight-Une programs (i.e., succinct expressions that generate finite words, or equiva- 
lently, acyclic partitioned DFAs, or equivalently, context-free graimnars that generate a 
single word). Plandowski has shown that this problem can be solved in polynomial time. 
Recall that this result is fundamental for our polynomial time algorithm for succinct ex- 
pressions (Theorem 4.15). In [10], it was conjectured that the equivalence problem for 
straight-line programs is P-complete, but this is still open. 
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